riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2018-01-13	Allow gather loads to be used for grouped accesses	Richard Sandiford	10	-13/+263
	Following on from the previous patch for strided accesses, this patch allows gather loads to be used with grouped accesses, if we otherwise would need to fall back to VMAT_ELEMENTWISE. However, as the comment says, this is restricted to single-element groups for now: ??? Although the code can handle all group sizes correctly, it probably isn't a win to use separate strided accesses based on nearby locations. Or, even if it's a win over scalar code, it might not be a win over vectorizing at a lower VF, if that allows us to use contiguous accesses. Single-element groups are an important special case though, and this means that code is less sensitive to GCC's classification of single accesses with constant steps as "grouped" and ones with variable steps as "strided". 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_gather_scatter_fn_p): Declare. * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public. * tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New function. (vect_use_strided_gather_scatters_p): Take a masked_p argument. Use vect_truncate_gather_scatter_offset if we can't treat the operation as a normal gather load or scatter store. (get_group_load_store_type): Take the gather_scatter_info as argument. Try using a gather load or scatter store for single-element groups. (get_load_store_type): Update calls to get_group_load_store_type and vect_use_strided_gather_scatters_p. gcc/testsuite/ * gcc.target/aarch64/sve/reduc_strict_3.c: Expect FADDA to be used for double_reduc1. * gcc.target/aarch64/sve/strided_load_4.c: New test. * gcc.target/aarch64/sve/strided_load_5.c: Likewise. * gcc.target/aarch64/sve/strided_load_6.c: Likewise. * gcc.target/aarch64/sve/strided_load_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256642
2018-01-13	Use gather loads for strided accesses	Richard Sandiford	8	-25/+283
	This patch tries to use gather loads for strided accesses, rather than falling back to VMAT_ELEMENTWISE. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_create_data_ref_ptr): Take an extra optional tree argument. * tree-vect-data-refs.c (vect_check_gather_scatter): Check for null target hooks. (vect_create_data_ref_ptr): Take the iv_step as an optional argument, but continue to use the current value as a fallback. (bump_vector_ptr): Use operand_equal_p rather than tree_int_cst_compare to compare the updates. * tree-vect-stmts.c (vect_use_strided_gather_scatters_p): New function. (get_load_store_type): Use it when handling a strided access. (vect_get_strided_load_store_ops): New function. (vect_get_data_ptr_increment): Likewise. (vectorizable_load): Handle strided gather loads. Always pass a step to vect_create_data_ref_ptr and bump_vector_ptr. gcc/testsuite/ * gcc.target/aarch64/sve/strided_load_1.c: New test. * gcc.target/aarch64/sve/strided_load_2.c: Likewise. * gcc.target/aarch64/sve/strided_load_3.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256641
2018-01-13	Add support for SVE gather loads	Richard Sandiford	33	-43/+1265
	This patch adds support for SVE gather loads. It uses the basically the same analysis code as the AVX gather support, but after that there are two major differences: - It uses new internal functions rather than target built-ins. The interface is: IFN_GATHER_LOAD (base, offsets scale) IFN_MASK_GATHER_LOAD (base, offsets scale, mask) which should be reasonably generic. One of the advantages of using internal functions is that other passes can understand what the functions do, but a more immediate advantage is that we can query the underlying target pattern to see which scales it supports. - It uses pattern recognition to convert the offset to the right width, if it was originally narrower than that. This avoids having to do a widening operation as part of the gather expansion itself. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (gather_load@var{m}): Document. (mask_gather_load@var{m}): Likewise. * genopinit.c (main): Add supports_vec_gather_load and supports_vec_gather_load_cached to target_optabs. * optabs-tree.c (init_tree_optimization_optabs): Use ggc_cleared_alloc to allocate target_optabs. * optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs. * internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal functions. * internal-fn.h (internal_load_fn_p): Declare. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * internal-fn.c (gather_load_direct): New macro. (expand_gather_load_optab_fn): New function. (direct_gather_load_optab_supported_p): New macro. (direct_internal_fn_optab): New function. (internal_load_fn_p): Likewise. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * optabs-query.c (supports_at_least_one_mode_p): New function. (supports_vec_gather_load_p): Likewise. * optabs-query.h (supports_vec_gather_load_p): Declare. * tree-vectorizer.h (gather_scatter_info): Add ifn, element_type and memory_type field. (NUM_PATTERNS): Bump to 15. * tree-vect-data-refs.c: Include internal-fn.h. (vect_gather_scatter_fn_p): New function. (vect_describe_gather_scatter_call): Likewise. (vect_check_gather_scatter): Try using internal functions for gather loads. Recognize existing calls to a gather load function. (vect_analyze_data_refs): Consider using gather loads if supports_vec_gather_load_p. * tree-vect-patterns.c (vect_get_load_store_mask): New function. (vect_get_gather_scatter_offset_type): Likewise. (vect_convert_mask_for_vectype): Likewise. (vect_add_conversion_to_patterm): Likewise. (vect_try_gather_scatter_pattern): Likewise. (vect_recog_gather_scatter_pattern): New pattern recognizer. (vect_vect_recog_func_ptrs): Add it. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_mask_index and internal_gather_scatter_fn_p. (check_load_store_masking): Take the gather_scatter_info as an argument and handle gather loads. (vect_get_gather_scatter_ops): New function. (vectorizable_call): Check internal_load_fn_p. (vectorizable_load): Likewise. Handle gather load internal functions. (vectorizable_store): Update call to check_load_store_masking. * config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec. * config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators. * config/aarch64/predicates.md (aarch64_gather_scale_operand_w) (aarch64_gather_scale_operand_d): New predicates. * config/aarch64/aarch64-sve.md (gather_load<mode>): New expander. (mask_gather_load<mode>): New insns. gcc/testsuite/ * gcc.target/aarch64/sve/gather_load_1.c: New test. * gcc.target/aarch64/sve/gather_load_2.c: Likewise. * gcc.target/aarch64/sve/gather_load_3.c: Likewise. * gcc.target/aarch64/sve/gather_load_4.c: Likewise. * gcc.target/aarch64/sve/gather_load_5.c: Likewise. * gcc.target/aarch64/sve/gather_load_6.c: Likewise. * gcc.target/aarch64/sve/gather_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256640
2018-01-13	Add support for in-order addition reduction using SVE FADDA	Richard Sandiford	27	-79/+960
	This patch adds support for in-order floating-point addition reductions, which are suitable even in strict IEEE mode. Previously vect_is_simple_reduction would reject any cases that forbid reassociation. The idea is instead to tentatively accept them as "FOLD_LEFT_REDUCTIONs" and only fail later if there is no support for them. Although this patch only handles the particular case of plus and minus on floating-point types, there's no reason in principle why we couldn't handle other cases. The reductions use a new fold_left_plus_optab if available, otherwise they fall back to elementwise additions or subtractions. The vect_force_simple_reduction change makes it easier for parloops to read the type of reduction. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (fold_left_plus_optab): New optab. * doc/md.texi (fold_left_plus_@var{m}): Document. * internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function. * internal-fn.c (fold_left_direct): Define. (expand_fold_left_optab_fn): Likewise. (direct_fold_left_optab_supported_p): Likewise. * fold-const-call.c (fold_const_fold_left): New function. (fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS. * tree-parloops.c (valid_reduction_p): New function. (gather_scalar_reductions): Use it. * tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type. (vect_finish_replace_stmt): Declare. * tree-vect-loop.c (fold_left_reduction_fn): New function. (needs_fold_left_reduction_p): New function, split out from... (vect_is_simple_reduction): ...here. Accept reductions that forbid reassociation, but give them type FOLD_LEFT_REDUCTION. (vect_force_simple_reduction): Also store the reduction type in the assignment's STMT_VINFO_REDUC_TYPE. (vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION. (merge_with_identity): New function. (vect_expand_fold_left): Likewise. (vectorize_fold_left_reduction): Likewise. (vectorizable_reduction): Handle FOLD_LEFT_REDUCTION. Leave the scalar phi in place for it. Check for target support and reject cases that would reassociate the operation. Defer the transform phase to vectorize_fold_left_reduction. * config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec. * config/aarch64/aarch64-sve.md (fold_left_plus_<mode>): New expander. (fold_left_plus_<mode>, pred_fold_left_plus_<mode>): New insns. gcc/testsuite/ * gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and check for a message about using in-order reductions. * gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and check for a message about using in-order reductions. * gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be vectorized and check for a message about using in-order reductions. Expect targets with variable-length vectors to fall back to the fixed-length mininum. * gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and check for a message about using in-order reductions. * gcc.dg/vect/vect-reduc-in-order-1.c: New test. * gcc.dg/vect/vect-reduc-in-order-2.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-3.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-4.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_1.c: New test. * gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_2.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_3.c: Likewise. * gcc.target/aarch64/sve/slp_13.c: Add floating-point types. * gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if vect_fold_left_plus. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256639
2018-01-13	Remove unnecessary temporary in tree-if-conv.c	Richard Sandiford	2	-1/+5
	The call to ifc_temp_var in predicate_mem_writes become redundant in r230099. Before that point the mask was calculated using fold_build_s, but now it's calculated by gimple_build and so is already a valid gimple value. As it stands, the call forces an SSA_NAME-to-SSA_NAME copy to be created, whereas SLP expects that such redundant copies have already been eliminated. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> gcc/ tree-if-conv.c (predicate_mem_writes): Remove redundant call to ifc_temp_var. From-SVN: r256638
2018-01-13	Rework the legitimize_address_displacement hook	Richard Sandiford	8	-72/+138
	This patch: - tweaks the handling of legitimize_address_displacement so that it gets called before rather than after the address has been expanded. This means that we're no longer at the mercy of LRA being able to interpret the expanded instructions. - passes the original offset to legitimize_address_displacement. - adds SVE support to the AArch64 implementation of legitimize_address_displacement. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.def (legitimize_address_displacement): Take the original offset as a poly_int. * targhooks.h (default_legitimize_address_displacement): Update accordingly. * targhooks.c (default_legitimize_address_displacement): Likewise. * doc/tm.texi: Regenerate. * lra-constraints.c (base_plus_disp_to_reg): Take the displacement as an argument, moving assert of ad->disp == ad->disp_term to... (process_address_1): ...here. Update calls to base_plus_disp_to_reg. Try calling targetm.legitimize_address_displacement before expanding the address rather than afterwards, and adjust for the new interface. * config/aarch64/aarch64.c (aarch64_legitimize_address_displacement): Match the new hook interface. Handle SVE addresses. * config/sh/sh.c (sh_legitimize_address_displacement): Make the new hook interface. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256637
2018-01-13	Add an "early rematerialisation" pass	Richard Sandiford	20	-0/+2945
	This patch looks for pseudo registers that are live across a call and for which no call-preserved hard registers exist. It then recomputes the pseudos as necessary to ensure that they are no longer live across a call. The comment at the head of the file describes the approach. A new target hook selects which modes should be treated in this way. By default none are, in which case the pass is skipped very early. It might also be worth looking for cases like: C1: R1 := f (...) ... C2: R2 := f (...) C3: R1 := C2 and giving the same value number to C1 and C3, effectively treating it like: C1: R1 := f (...) ... C2: R2 := f (...) C3: R1 := f (...) Another (much more expensive) enhancement would be to apply value numbering to all pseudo registers (not just rematerialisation candidates), so that we can handle things like: C1: R1 := f (...R2...) ... C2: R1 := f (...R3...) where R2 and R3 hold the same value. But the current pass seems to catch the vast majority of cases. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * Makefile.in (OBJS): Add early-remat.o. * target.def (select_early_remat_modes): New hook. * doc/tm.texi.in (TARGET_SELECT_EARLY_REMAT_MODES): New hook. * doc/tm.texi: Regenerate. * targhooks.h (default_select_early_remat_modes): Declare. * targhooks.c (default_select_early_remat_modes): New function. * timevar.def (TV_EARLY_REMAT): New timevar. * passes.def (pass_early_remat): New pass. * tree-pass.h (make_pass_early_remat): Declare. * early-remat.c: New file. * config/aarch64/aarch64.c (aarch64_select_early_remat_modes): New function. (TARGET_SELECT_EARLY_REMAT_MODES): Define. gcc/testsuite/ * gcc.target/aarch64/sve/spill_1.c: Also test that no predicates are spilled. * gcc.target/aarch64/sve/spill_2.c: New test. * gcc.target/aarch64/sve/spill_3.c: Likewise. * gcc.target/aarch64/sve/spill_4.c: Likewise. * gcc.target/aarch64/sve/spill_5.c: Likewise. * gcc.target/aarch64/sve/spill_6.c: Likewise. * gcc.target/aarch64/sve/spill_7.c: Likewise. From-SVN: r256636
2018-01-13	Use single-iteration epilogues when peeling for gaps	Richard Sandiford	14	-50/+458
	This patch adds support for fully-masking loops that require peeling for gaps. It peels exactly one scalar iteration and uses the masked loop to handle the rest. Previously we would fall back on using a standard unmasked loop instead. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Replace vfm1 with a bound_epilog parameter. (vect_do_peeling): Update calls accordingly, and move the prologue call earlier in the function. Treat the base bound_epilog as 0 for fully-masked loops and retain vf - 1 for other loops. Add 1 to this base when peeling for gaps. * tree-vect-loop.c (vect_analyze_loop_2): Allow peeling for gaps with fully-masked loops. (vect_estimate_min_profitable_iters): Handle the single peeled iteration in that case. gcc/testsuite/ * gcc.target/aarch64/sve/struct_vect_18.c: Check the number of branches. * gcc.target/aarch64/sve/struct_vect_19.c: Likewise. * gcc.target/aarch64/sve/struct_vect_20.c: New test. * gcc.target/aarch64/sve/struct_vect_20_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_21.c: Likewise. * gcc.target/aarch64/sve/struct_vect_21_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_22.c: Likewise. * gcc.target/aarch64/sve/struct_vect_22_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_23.c: Likewise. * gcc.target/aarch64/sve/struct_vect_23_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256635
2018-01-13	Allow single-element interleaving for non-power-of-2 strides	Richard Sandiford	8	-4/+192
	This allows LD3 to be used for isolated a[i * 3] accesses, in a similar way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively. Given the problems with the cost model underestimating the cost of elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE cases that are currently rejected. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-data-refs.c (vect_analyze_group_access_1): Allow single-element interleaving even if the size is not a power of 2. * tree-vect-stmts.c (get_load_store_type): Disallow elementwise accesses for single-element interleaving if the group size is not a power of 2. gcc/testsuite/ * gcc.target/aarch64/sve/struct_vect_18.c: New test. * gcc.target/aarch64/sve/struct_vect_18_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_19.c: Likewise. * gcc.target/aarch64/sve/struct_vect_19_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256634
2018-01-13	Add support for conditional reductions using SVE CLASTB	Richard Sandiford	38	-80/+569
	This patch uses SVE CLASTB to optimise conditional reductions. It means that we no longer need to maintain a separate index vector to record the most recent valid value, and no longer need to worry about overflow cases. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (fold_extract_last_@var{m}): Document. * doc/sourcebuild.texi (vect_fold_extract_last): Likewise. * optabs.def (fold_extract_last_optab): New optab. * internal-fn.def (FOLD_EXTRACT_LAST): New internal function. * internal-fn.c (fold_extract_direct): New macro. (expand_fold_extract_optab_fn): Likewise. (direct_fold_extract_optab_supported_p): Likewise. * tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type. * tree-vect-loop.c (vect_model_reduction_cost): Handle EXTRACT_LAST_REDUCTION. (get_initial_def_for_reduction): Do not create an initial vector for EXTRACT_LAST_REDUCTION reductions. (vectorizable_reduction): Leave the scalar phi in place for EXTRACT_LAST_REDUCTIONs. Try using EXTRACT_LAST_REDUCTION ahead of INTEGER_INDUC_COND_REDUCTION. Do not check for an epilogue code for EXTRACT_LAST_REDUCTION and defer the transform phase to vectorizable_condition. * tree-vect-stmts.c (vect_finish_stmt_generation_1): New function, split out from... (vect_finish_stmt_generation): ...here. (vect_finish_replace_stmt): New function. (vectorizable_condition): Handle EXTRACT_LAST_REDUCTION. * config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New pattern. * config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_fold_extract_last): New proc. * gcc.dg/vect/pr65947-1.c: Update dump messages. Add markup for fold_extract_last. * gcc.dg/vect/pr65947-2.c: Likewise. * gcc.dg/vect/pr65947-3.c: Likewise. * gcc.dg/vect/pr65947-4.c: Likewise. * gcc.dg/vect/pr65947-5.c: Likewise. * gcc.dg/vect/pr65947-6.c: Likewise. * gcc.dg/vect/pr65947-9.c: Likewise. * gcc.dg/vect/pr65947-10.c: Likewise. * gcc.dg/vect/pr65947-12.c: Likewise. * gcc.dg/vect/pr65947-14.c: Likewise. * gcc.dg/vect/pr80631-1.c: Likewise. * gcc.target/aarch64/sve/clastb_1.c: New test. * gcc.target/aarch64/sve/clastb_1_run.c: Likewise. * gcc.target/aarch64/sve/clastb_2.c: Likewise. * gcc.target/aarch64/sve/clastb_2_run.c: Likewise. * gcc.target/aarch64/sve/clastb_3.c: Likewise. * gcc.target/aarch64/sve/clastb_3_run.c: Likewise. * gcc.target/aarch64/sve/clastb_4.c: Likewise. * gcc.target/aarch64/sve/clastb_4_run.c: Likewise. * gcc.target/aarch64/sve/clastb_5.c: Likewise. * gcc.target/aarch64/sve/clastb_5_run.c: Likewise. * gcc.target/aarch64/sve/clastb_6.c: Likewise. * gcc.target/aarch64/sve/clastb_6_run.c: Likewise. * gcc.target/aarch64/sve/clastb_7.c: Likewise. * gcc.target/aarch64/sve/clastb_7_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256633
2018-01-13	Add support for vectorising live-out values using SVE LASTB	Richard Sandiford	10	-22/+191
	This patch uses the SVE LASTB instruction to optimise cases in which a value produced by the final scalar iteration of a vectorised loop is live outside the loop. Previously this situation would stop us from using a fully-masked loop. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (extract_last_@var{m}): Document. * optabs.def (extract_last_optab): New optab. * internal-fn.def (EXTRACT_LAST): New internal function. * internal-fn.c (cond_unary_direct): New macro. (expand_cond_unary_optab_fn): Likewise. (direct_cond_unary_optab_supported_p): Likewise. * tree-vect-loop.c (vectorizable_live_operation): Allow fully-masked loops using EXTRACT_LAST. * config/aarch64/aarch64-sve.md (aarch64_sve_lastb<mode>): Rename to... (extract_last_<mode>): ...this optab. (vec_extract<mode><Vel>): Update accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/live_1.c: New test. * gcc.target/aarch64/sve/live_1_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256632
2018-01-13	Add an empty_mask_is_expensive hook	Richard Sandiford	8	-1/+58
	This patch adds a hook to control whether we avoid executing masked (predicated) stores when the mask is all false. We don't want to do that by default for SVE. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.def (empty_mask_is_expensive): New hook. * doc/tm.texi.in (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): New hook. * doc/tm.texi: Regenerate. * targhooks.h (default_empty_mask_is_expensive): Declare. * targhooks.c (default_empty_mask_is_expensive): New function. * tree-vectorizer.c (vectorize_loops): Only call optimize_mask_stores if the target says that empty masks are expensive. * config/aarch64/aarch64.c (aarch64_empty_mask_is_expensive): New function. (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Redefine. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256631
2018-01-13	Handle peeling for alignment with masking	Richard Sandiford	15	-82/+567
	This patch adds support for aligning vectors by using a partial first iteration. E.g. if the start pointer is 3 elements beyond an aligned address, the first iteration will have a mask in which the first three elements are false. On SVE, the optimisation is only useful for vector-length-specific code. Vector-length-agnostic code doesn't try to align vectors since the vector length might not be a power of 2. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field. (LOOP_VINFO_MASK_SKIP_NITERS): New macro. (vect_use_loop_mask_for_alignment_p): New function. (vect_prepare_for_masked_peels, vect_gen_while_not): Declare. * tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an niters_skip argument. Make sure that the first niters_skip elements of the first iteration are inactive. (vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS. Update call to vect_set_loop_masks_directly. (get_misalign_in_elems): New function, split out from... (vect_gen_prolog_loop_niters): ...here. (vect_update_init_of_dr): Take a code argument that specifies whether the adjustment should be added or subtracted. (vect_update_init_of_drs): Likewise. (vect_prepare_for_masked_peels): New function. (vect_do_peeling): Skip prologue peeling if we're using a mask instead. Update call to vect_update_inits_of_drs. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize mask_skip_niters. (vect_analyze_loop_2): Allow fully-masked loops with peeling for alignment. Do not include the number of peeled iterations in the minimum threshold in that case. (vectorizable_induction): Adjust the start value down by LOOP_VINFO_MASK_SKIP_NITERS iterations. (vect_transform_loop): Call vect_prepare_for_masked_peels. Take the number of skipped iterations into account when calculating the loop bounds. * tree-vect-stmts.c (vect_gen_while_not): New function. gcc/testsuite/ * gcc.target/aarch64/sve/nopeel_1.c: New test. * gcc.target/aarch64/sve/peel_ind_1.c: Likewise. * gcc.target/aarch64/sve/peel_ind_1_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_3.c: Likewise. * gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_4.c: Likewise. * gcc.target/aarch64/sve/peel_ind_4_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256630
2018-01-13	Allow the number of iterations to be smaller than VF	Richard Sandiford	13	-74/+361
	Fully-masked loops can be profitable even if the iteration count is smaller than the vectorisation factor. In this case we're effectively doing a complete unroll followed by SLP. The documentation for min-vect-loop-bound says that the default value was 0, but actually the default and minimum were 1. We need it to be 0 for this case since the parameter counts a whole number of vector iterations. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/sourcebuild.texi (vect_fully_masked): Document. * params.def (PARAM_MIN_VECT_LOOP_BOUND): Change minimum and default value to 0. * tree-vect-loop.c (vect_analyze_loop_costing): New function, split out from... (vect_analyze_loop_2): ...here. Don't check the vectorization factor against the number of loop iterations if the loop is fully-masked. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_fully_masked): New proc. * gcc.dg/vect/slp-3.c: Expect all loops to be vectorized if vect_fully_masked. * gcc.target/aarch64/sve/loop_add_4.c: New test. * gcc.target/aarch64/sve/loop_add_4_run.c: Likewise. * gcc.target/aarch64/sve/loop_add_5.c: Likewise. * gcc.target/aarch64/sve/loop_add_5_run.c: Likewise. * gcc.target/aarch64/sve/miniloop_1.c: Likewise. * gcc.target/aarch64/sve/miniloop_2.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256629
2018-01-13	Make ivopts handle calls to internal functions	Richard Sandiford	15	-39/+354
	ivopts previously treated pointer arguments to internal functions like IFN_MASK_LOAD and IFN_MASK_STORE as normal gimple values. This patch makes it treat them as addresses instead. This makes a significant difference to the code quality for SVE loops, since we can then use loads and stores with scaled indices. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-ssa-loop-ivopts.c (USE_ADDRESS): Split into... (USE_REF_ADDRESS, USE_PTR_ADDRESS): ...these new use types. (dump_groups): Update accordingly. (iv_use::mem_type): New member variable. (address_p): New function. (record_use): Add a mem_type argument and initialize the new mem_type field. (record_group_use): Add a mem_type argument. Use address_p. Remove obsolete null checks of base_object. Update call to record_use. (find_interesting_uses_op): Update call to record_group_use. (find_interesting_uses_cond): Likewise. (find_interesting_uses_address): Likewise. (get_mem_type_for_internal_fn): New function. (find_address_like_use): Likewise. (find_interesting_uses_stmt): Try find_address_like_use before calling find_interesting_uses_op. (addr_offset_valid_p): Use the iv mem_type field as the type of the addressed memory. (add_autoinc_candidates): Likewise. (get_address_cost): Likewise. (split_small_address_groups_p): Use address_p. (split_address_groups): Likewise. (add_iv_candidate_for_use): Likewise. (autoinc_possible_for_pair): Likewise. (rewrite_groups): Likewise. (get_use_type): Check for USE_REF_ADDRESS instead of USE_ADDRESS. (determine_group_iv_cost): Update after split of USE_ADDRESS. (get_alias_ptr_type_for_ptr_address): New function. (rewrite_use_address): Rewrite address uses in calls that were identified by find_address_like_use. gcc/testsuite/ * gcc.dg/tree-ssa/scev-9.c: Expected REFERENCE ADDRESS instead of just ADDRESS. * gcc.dg/tree-ssa/scev-10.c: Likewise. * gcc.dg/tree-ssa/scev-11.c: Likewise. * gcc.dg/tree-ssa/scev-12.c: Likewise. * gcc.target/aarch64/sve/index_offset_1.c: New test. * gcc.target/aarch64/sve/index_offset_1_run.c: Likewise. * gcc.target/aarch64/sve/loop_add_2.c: Likewise. * gcc.target/aarch64/sve/loop_add_3.c: Likewise. * gcc.target/aarch64/sve/while_1.c: Check for indexed addressing modes. * gcc.target/aarch64/sve/while_2.c: Likewise. * gcc.target/aarch64/sve/while_3.c: Likewise. * gcc.target/aarch64/sve/while_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256628
2018-01-13	Allow ADDR_EXPRs of TARGET_MEM_REFs	Richard Sandiford	5	-15/+63
	This patch allows ADDR_EXPR <TARGET_MEM_REF ...>, which is useful when calling internal functions that take pointers to memory that is conditionally loaded or stored. This is a prerequisite to the following ivopts patch. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * expr.c (expand_expr_addr_expr_1): Handle ADDR_EXPRs of TARGET_MEM_REFs. * gimple-expr.h (is_gimple_addressable: Likewise. * gimple-expr.c (is_gimple_address): Likewise. * internal-fn.c (expand_call_mem_ref): New function. (expand_mask_load_optab_fn): Use it. (expand_mask_store_optab_fn): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256627
2018-01-13	Add support for reductions in fully-masked loops	Richard Sandiford	17	-49/+442
	This patch removes the restriction that fully-masked loops cannot have reductions. The key thing here is to make sure that the reduction accumulator doesn't include any values associated with inactive lanes; the patch adds a bunch of conditional binary operations for doing that. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (cond_add@var{mode}, cond_sub@var{mode}) (cond_and@var{mode}, cond_ior@var{mode}, cond_xor@var{mode}) (cond_smin@var{mode}, cond_smax@var{mode}, cond_umin@var{mode}) (cond_umax@var{mode}): Document. * optabs.def (cond_add_optab, cond_sub_optab, cond_and_optab) (cond_ior_optab, cond_xor_optab, cond_smin_optab, cond_smax_optab) (cond_umin_optab, cond_umax_optab): New optabs. * internal-fn.def (COND_ADD, COND_SUB, COND_MIN, COND_MAX, COND_AND) (COND_IOR, COND_XOR): New internal functions. * internal-fn.h (get_conditional_internal_fn): Declare. * internal-fn.c (cond_binary_direct): New macro. (expand_cond_binary_optab_fn): Likewise. (direct_cond_binary_optab_supported_p): Likewise. (get_conditional_internal_fn): New function. * tree-vect-loop.c (vectorizable_reduction): Handle fully-masked loops. Cope with reduction statements that are vectorized as calls rather than assignments. * config/aarch64/aarch64-sve.md (cond_<optab><mode>): New insns. * config/aarch64/iterators.md (UNSPEC_COND_ADD, UNSPEC_COND_SUB) (UNSPEC_COND_SMAX, UNSPEC_COND_UMAX, UNSPEC_COND_SMIN) (UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR) (UNSPEC_COND_EOR): New unspecs. (optab): Add mappings for them. (SVE_COND_INT_OP, SVE_COND_FP_OP): New int iterators. (sve_int_op, sve_fp_op): New int attributes. gcc/testsuite/ * gcc.dg/vect/pr60482.c: Remove XFAIL for variable-length vectors. * gcc.target/aarch64/sve/reduc_1.c: Expect the loop operations to be predicated. * gcc.target/aarch64/sve/slp_5.c: Check for a fully-masked loop. * gcc.target/aarch64/sve/slp_7.c: Likewise. * gcc.target/aarch64/sve/reduc_5.c: New test. * gcc.target/aarch64/sve/slp_13.c: Likewise. * gcc.target/aarch64/sve/slp_13_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256626
2018-01-13	Add support for fully-predicated loops	Richard Sandiford	39	-94/+2243
	This patch adds support for using a single fully-predicated loop instead of a vector loop and a scalar tail. An SVE WHILELO instruction generates the predicate for each iteration of the loop, given the current scalar iv value and the loop bound. This operation is wrapped up in a new internal function called WHILE_ULT. E.g.: WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 } WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 } The third WHILE_ULT argument is needed to make the operation unambiguous: without it, WHILE_ULT (0, 3) for one vector type would seem equivalent to WHILE_ULT (0, 3) for another, even if the types have different numbers of elements. Note that the patch uses "mask" and "fully-masked" instead of "predicate" and "fully-predicated", to follow existing GCC terminology. This patch just handles the simple cases, punting for things like reductions and live-out values. Later patches remove most of these restrictions. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (while_ult_optab): New optab. * doc/md.texi (while_ult@var{m}@var{n}): Document. * internal-fn.def (WHILE_ULT): New internal function. * internal-fn.h (direct_internal_fn_supported_p): New override that takes two types as argument. * internal-fn.c (while_direct): New macro. (expand_while_optab_fn): New function. (convert_optab_supported_p): Likewise. (direct_while_optab_supported_p): New macro. * wide-int.h (wi::udiv_ceil): New function. * tree-vectorizer.h (rgroup_masks): New structure. (vec_loop_masks): New typedef. (_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p and fully_masked_p. (LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P) (LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros. (vect_max_vf): New function. (slpeel_make_loop_iterate_ntimes): Delete. (vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while) (vect_halve_mask_nunits, vect_double_mask_nunits): Declare. (vect_record_loop_mask, vect_get_loop_mask): Likewise. * tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h, internal-fn.h, stor-layout.h and optabs-query.h. (vect_set_loop_mask): New function. (add_preheader_seq): Likewise. (add_header_seq): Likewise. (interleave_supported_p): Likewise. (vect_maybe_permute_loop_masks): Likewise. (vect_set_loop_masks_directly): Likewise. (vect_set_loop_condition_masked): Likewise. (vect_set_loop_condition_unmasked): New function, split out from slpeel_make_loop_iterate_ntimes. (slpeel_make_loop_iterate_ntimes): Rename to.. (vect_set_loop_condition): ...this. Use vect_set_loop_condition_masked for fully-masked loops and vect_set_loop_condition_unmasked otherwise. (vect_do_peeling): Update call accordingly. (vect_gen_vector_loop_niters): Use VF as the step for fully-masked loops. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize mask_compare_type, can_fully_mask_p and fully_masked_p. (release_vec_loop_masks): New function. (_loop_vec_info): Use it to free the loop masks. (can_produce_all_loop_masks_p): New function. (vect_get_max_nscalars_per_iter): Likewise. (vect_verify_full_masking): Likewise. (vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around retries, and free the mask rgroups before retrying. Check loop-wide reasons for disallowing fully-masked loops. Make the final decision about whether use a fully-masked loop or not. (vect_estimate_min_profitable_iters): Do not assume that peeling for the number of iterations will be needed for fully-masked loops. (vectorizable_reduction): Disable fully-masked loops. (vectorizable_live_operation): Likewise. (vect_halve_mask_nunits): New function. (vect_double_mask_nunits): Likewise. (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. (vect_transform_loop): Handle the case in which the final loop iteration might handle a partial vector. Call vect_set_loop_condition instead of slpeel_make_loop_iterate_ntimes. * tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h. (check_load_store_masking): New function. (prepare_load_store_mask): Likewise. (vectorizable_store): Handle fully-masked loops. (vectorizable_load): Likewise. (supportable_widening_operation): Use vect_halve_mask_nunits for booleans. (supportable_narrowing_operation): Likewise vect_double_mask_nunits. (vect_gen_while): New function. * config/aarch64/aarch64.md (umax<mode>3): New expander. (aarch64_uqdec<mode>): New insn. gcc/testsuite/ * gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization. * gcc.dg/tree-ssa/peel1.c: Likewise. * gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for variable-length vectors. * gcc.target/aarch64/sve/vcond_6.c: XFAIL test for AND. * gcc.target/aarch64/sve/vec_bool_cmp_1.c: Expect BIC instead of NOT. * gcc.target/aarch64/sve/slp_1.c: Check for a fully-masked loop. * gcc.target/aarch64/sve/slp_2.c: Likewise. * gcc.target/aarch64/sve/slp_3.c: Likewise. * gcc.target/aarch64/sve/slp_4.c: Likewise. * gcc.target/aarch64/sve/slp_6.c: Likewise. * gcc.target/aarch64/sve/slp_8.c: New test. * gcc.target/aarch64/sve/slp_8_run.c: Likewise. * gcc.target/aarch64/sve/slp_9.c: Likewise. * gcc.target/aarch64/sve/slp_9_run.c: Likewise. * gcc.target/aarch64/sve/slp_10.c: Likewise. * gcc.target/aarch64/sve/slp_10_run.c: Likewise. * gcc.target/aarch64/sve/slp_11.c: Likewise. * gcc.target/aarch64/sve/slp_11_run.c: Likewise. * gcc.target/aarch64/sve/slp_12.c: Likewise. * gcc.target/aarch64/sve/slp_12_run.c: Likewise. * gcc.target/aarch64/sve/ld1r_2.c: Likewise. * gcc.target/aarch64/sve/ld1r_2_run.c: Likewise. * gcc.target/aarch64/sve/while_1.c: Likewise. * gcc.target/aarch64/sve/while_2.c: Likewise. * gcc.target/aarch64/sve/while_3.c: Likewise. * gcc.target/aarch64/sve/while_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256625
2018-01-13	Add support for bitwise reductions	Richard Sandiford	17	-18/+283
	This patch adds support for the SVE bitwise reduction instructions (ANDV, ORV and EORV). It's a fairly mechanical extension of existing REDUC_* operators. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) (reduc_xor_scal_optab): New optabs. * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) (reduc_xor_scal_@var{m}): Document. * doc/sourcebuild.texi (vect_logical_reduc): Likewise. * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New internal functions. * fold-const-call.c (fold_const_call): Handle them. * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): (reduc_<bit_reduc>_scal_<mode>): New patterns. config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) (UNSPEC_XORV): New unspecs. (optab): Add entries for them. (BITWISEV): New int iterator. (bit_reduc_op): New int attributes. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_logical_reduc): New proc. * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc and add an associated scan-dump test. Prevent vectorization of the first two loops. * gcc.dg/vect/vect-reduc-or_2.c: Likewise. * gcc.target/aarch64/sve/reduc_1.c: Add AND, IOR and XOR reductions. * gcc.target/aarch64/sve/reduc_2.c: Likewise. * gcc.target/aarch64/sve/reduc_1_run.c: Likewise. (INIT_VECTOR): Tweak initial value so that some bits are always set. * gcc.target/aarch64/sve/reduc_2_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256624
2018-01-13	SLP reductions with variable-length vectors	Richard Sandiford	22	-81/+637
	Two things stopped us using SLP reductions with variable-length vectors: (1) We didn't have a way of constructing the initial vector. This patch does it by creating a vector full of the neutral identity value and then using a shift-and-insert function to insert any non-identity inputs into the low-numbered elements. (The non-identity values are needed for double reductions.) Alternatively, for unchained MIN/MAX reductions that have no neutral value, we instead use the same duplicate-and-interleave approach as for SLP constant and external definitions (added by a previous patch). (2) The epilogue for constant-length vectors would extract the vector elements associated with each SLP statement and do scalar arithmetic on these individual elements. For variable-length vectors, the patch instead creates a reduction vector for each SLP statement, replacing the elements for other SLP statements with the identity value. It then uses a hardware reduction instruction on each vector. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (vec_shl_insert_@var{m}): New optab. * internal-fn.def (VEC_SHL_INSERT): New internal function. * optabs.def (vec_shl_insert_optab): New optab. * tree-vectorizer.h (can_duplicate_and_interleave_p): Declare. (duplicate_and_interleave): Likewise. * tree-vect-loop.c: Include internal-fn.h. (neutral_op_for_slp_reduction): New function, split out from get_initial_defs_for_reduction. (get_initial_def_for_reduction): Handle option 2 for variable-length vectors by loading the neutral value into a vector and then shifting the initial value into element 0. (get_initial_defs_for_reduction): Replace the code argument with the neutral value calculated by neutral_op_for_slp_reduction. Use gimple_build_vector for constant-length vectors. Use IFN_VEC_SHL_INSERT for variable-length vectors if all but the first group_size elements have a neutral value. Use duplicate_and_interleave otherwise. (vect_create_epilog_for_reduction): Take a neutral_op parameter. Update call to get_initial_defs_for_reduction. Handle SLP reductions for variable-length vectors by creating one vector result for each scalar result, with the elements associated with other scalar results stubbed out with the neutral value. (vectorizable_reduction): Call neutral_op_for_slp_reduction. Require IFN_VEC_SHL_INSERT for double reductions on variable-length vectors, or SLP reductions that have a neutral value. Require can_duplicate_and_interleave_p support for variable-length unchained SLP reductions if there is no neutral value, such as for MIN/MAX reductions. Also require the number of vector elements to be a multiple of the number of SLP statements when doing variable-length unchained SLP reductions. Update call to vect_create_epilog_for_reduction. * tree-vect-slp.c (can_duplicate_and_interleave_p): Make public and remove initial values. (duplicate_and_interleave): Make public. * config/aarch64/aarch64.md (UNSPEC_INSR): New unspec. * config/aarch64/aarch64-sve.md (vec_shl_insert_<mode>): New insn. gcc/testsuite/ * gcc.dg/vect/pr37027.c: Remove XFAIL for variable-length vectors. * gcc.dg/vect/pr67790.c: Likewise. * gcc.dg/vect/slp-reduc-1.c: Likewise. * gcc.dg/vect/slp-reduc-2.c: Likewise. * gcc.dg/vect/slp-reduc-3.c: Likewise. * gcc.dg/vect/slp-reduc-5.c: Likewise. * gcc.target/aarch64/sve/slp_5.c: New test. * gcc.target/aarch64/sve/slp_5_run.c: Likewise. * gcc.target/aarch64/sve/slp_6.c: Likewise. * gcc.target/aarch64/sve/slp_6_run.c: Likewise. * gcc.target/aarch64/sve/slp_7.c: Likewise. * gcc.target/aarch64/sve/slp_7_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256623
2018-01-13	Handle more SLP constant and extern definitions for variable VF	Richard Sandiford	47	-76/+635
	This patch adds support for vectorising SLP definitions that are constant or external (i.e. from outside the loop) when the vectorisation factor isn't known at compile time. It can only handle cases where the number of SLP statements is a power of 2. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-slp.c: Include gimple-fold.h and internal-fn.h (can_duplicate_and_interleave_p): New function. (vect_get_and_check_slp_defs): Take the vector of statements rather than just the current one. Remove excess parentheses. Restriction rejectinon of vect_constant_def and vect_external_def for variable-length vectors to boolean types, or types for which can_duplicate_and_interleave_p is false. (vect_build_slp_tree_2): Update call to vect_get_and_check_slp_defs. (duplicate_and_interleave): New function. (vect_get_constant_vectors): Use gimple_build_vector for constant-length vectors and suitable variable-length constant vectors. Use duplicate_and_interleave for other variable-length vectors. Don't defer the update when inserting new statements. gcc/testsuite/ * gcc.dg/vect/no-scevccp-slp-30.c: Don't XFAIL for vect_variable_length && vect_load_lanes * gcc.dg/vect/slp-1.c: Likewise. * gcc.dg/vect/slp-10.c: Likewise. * gcc.dg/vect/slp-12b.c: Likewise. * gcc.dg/vect/slp-12c.c: Likewise. * gcc.dg/vect/slp-17.c: Likewise. * gcc.dg/vect/slp-19b.c: Likewise. * gcc.dg/vect/slp-20.c: Likewise. * gcc.dg/vect/slp-21.c: Likewise. * gcc.dg/vect/slp-22.c: Likewise. * gcc.dg/vect/slp-23.c: Likewise. * gcc.dg/vect/slp-24-big-array.c: Likewise. * gcc.dg/vect/slp-24.c: Likewise. * gcc.dg/vect/slp-28.c: Likewise. * gcc.dg/vect/slp-39.c: Likewise. * gcc.dg/vect/slp-6.c: Likewise. * gcc.dg/vect/slp-7.c: Likewise. * gcc.dg/vect/slp-cond-1.c: Likewise. * gcc.dg/vect/slp-cond-2-big-array.c: Likewise. * gcc.dg/vect/slp-cond-2.c: Likewise. * gcc.dg/vect/slp-multitypes-1.c: Likewise. * gcc.dg/vect/slp-multitypes-8.c: Likewise. * gcc.dg/vect/slp-multitypes-9.c: Likewise. * gcc.dg/vect/slp-multitypes-10.c: Likewise. * gcc.dg/vect/slp-multitypes-12.c: Likewise. * gcc.dg/vect/slp-perm-6.c: Likewise. * gcc.dg/vect/slp-widen-mult-half.c: Likewise. * gcc.dg/vect/vect-live-slp-1.c: Likewise. * gcc.dg/vect/vect-live-slp-2.c: Likewise. * gcc.dg/vect/pr33953.c: Don't XFAIL for vect_variable_length. * gcc.dg/vect/slp-12a.c: Likewise. * gcc.dg/vect/slp-14.c: Likewise. * gcc.dg/vect/slp-15.c: Likewise. * gcc.dg/vect/slp-multitypes-2.c: Likewise. * gcc.dg/vect/slp-multitypes-4.c: Likewise. * gcc.dg/vect/slp-multitypes-5.c: Likewise. * gcc.target/aarch64/sve/slp_1.c: New test. * gcc.target/aarch64/sve/slp_1_run.c: Likewise. * gcc.target/aarch64/sve/slp_2.c: Likewise. * gcc.target/aarch64/sve/slp_2_run.c: Likewise. * gcc.target/aarch64/sve/slp_3.c: Likewise. * gcc.target/aarch64/sve/slp_3_run.c: Likewise. * gcc.target/aarch64/sve/slp_4.c: Likewise. * gcc.target/aarch64/sve/slp_4_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256622
2018-01-13	Protect against min_profitable_iters going negative	Richard Sandiford	2	-9/+17
	We had: if (vec_outside_cost <= 0) min_profitable_iters = 0; else { min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * assumed_vf - vec_inside_cost * peel_iters_prologue - vec_inside_cost * peel_iters_epilogue) / ((scalar_single_iter_cost * assumed_vf) - vec_inside_cost); which can lead to negative min_profitable_iters when the _outside_costs are the same and peel_iters_epilogue is nonzero (e.g. if we're peeling for gaps). This is tested as part of the patch that adds support for fully-predicated loops. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ tree-vect-loop.c (vect_estimate_min_profitable_iters): Make sure min_profitable_iters doesn't go negative. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256621
2018-01-13	Add support for masked load/store_lanes	Richard Sandiford	30	-57/+1260
	This patch adds support for vectorising groups of IFN_MASK_LOADs and IFN_MASK_STOREs using conditional load/store-lanes instructions. This requires new internal functions to represent the result (IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs. The normal IFN_{LOAD,STORE}_LANES functions are const operations that logically just perform the permute: the load or store is encoded as a MEM operand to the call statement. In contrast, the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of interface as IFN_MASK_{LOAD,STORE}, since the memory is only conditionally accessed. The AArch64 patterns were added as part of the main LD[234]/ST[234] patch. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document. (vec_mask_store_lanes@var{m}@var{n}): Likewise. * optabs.def (vec_mask_load_lanes_optab): New optab. (vec_mask_store_lanes_optab): Likewise. * internal-fn.def (MASK_LOAD_LANES): New internal function. (MASK_STORE_LANES): Likewise. * internal-fn.c (mask_load_lanes_direct): New macro. (mask_store_lanes_direct): Likewise. (expand_mask_load_optab_fn): Handle masked operations. (expand_mask_load_lanes_optab_fn): New macro. (expand_mask_store_optab_fn): Handle masked operations. (expand_mask_store_lanes_optab_fn): New macro. (direct_mask_load_lanes_optab_supported_p): Likewise. (direct_mask_store_lanes_optab_supported_p): Likewise. * tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-data-refs.c (strip_conversion): New function. (can_group_stmts_p): Likewise. (vect_analyze_data_ref_accesses): Use it instead of checking for a pair of assignments. (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Update calls to vect_store_lanes_supported and vect_load_lanes_supported. * tree-vect-slp.c (vect_analyze_slp_instance): Likewise. * tree-vect-stmts.c (get_group_load_store_type): Take a masked_p parameter. Don't allow gaps for masked accesses. Use vect_get_store_rhs. Update calls to vect_store_lanes_supported and vect_load_lanes_supported. (get_load_store_type): Take a masked_p parameter and update call to get_group_load_store_type. (vectorizable_store): Update call to get_load_store_type. Handle IFN_MASK_STORE_LANES. (vectorizable_load): Update call to get_load_store_type. Handle IFN_MASK_LOAD_LANES. gcc/testsuite/ * gcc.dg/vect/vect-ooo-group-1.c: New test. * gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_2_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_3_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256620
2018-01-13	[AArch64] Tests for SVE structure modes	Richard Sandiford	37	-0/+1469
	This patch adds tests for the SVE structure mode move patterns and for LD[234] and ST[234] vectorisation. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/testsuite/ * gcc.target/aarch64/sve/struct_move_1.c: New test. * gcc.target/aarch64/sve/struct_move_2.c: Likewise. * gcc.target/aarch64/sve/struct_move_3.c: Likewise. * gcc.target/aarch64/sve/struct_move_4.c: Likewise. * gcc.target/aarch64/sve/struct_move_5.c: Likewise. * gcc.target/aarch64/sve/struct_move_6.c: Likewise. * gcc.target/aarch64/sve/struct_vect_1.c: Likewise. * gcc.target/aarch64/sve/struct_vect_1_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_2.c: Likewise. * gcc.target/aarch64/sve/struct_vect_2_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_3.c: Likewise. * gcc.target/aarch64/sve/struct_vect_3_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_4.c: Likewise. * gcc.target/aarch64/sve/struct_vect_4_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_5.c: Likewise. * gcc.target/aarch64/sve/struct_vect_5_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_6.c: Likewise. * gcc.target/aarch64/sve/struct_vect_6_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_7.c: Likewise. * gcc.target/aarch64/sve/struct_vect_7_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_8.c: Likewise. * gcc.target/aarch64/sve/struct_vect_8_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_9.c: Likewise. * gcc.target/aarch64/sve/struct_vect_9_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_10.c: Likewise. * gcc.target/aarch64/sve/struct_vect_10_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_11.c: Likewise. * gcc.target/aarch64/sve/struct_vect_11_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_12.c: Likewise. * gcc.target/aarch64/sve/struct_vect_12_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_13.c: Likewise. * gcc.target/aarch64/sve/struct_vect_13_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_14.c: Likewise. * gcc.target/aarch64/sve/struct_vect_15.c: Likewise. * gcc.target/aarch64/sve/struct_vect_16.c: Likewise. * gcc.target/aarch64/sve/struct_vect_17.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256619
2018-01-13	[AArch64] SVE load/store_lanes support	Richard Sandiford	26	-48/+425
	This patch adds support for SVE LD[234], ST[234] and associated structure modes. Unlike Advanced SIMD, these modes are extra-long vector modes instead of integer modes. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * config/aarch64/aarch64-modes.def: Define x2, x3 and x4 vector modes for SVE. * config/aarch64/aarch64-protos.h (aarch64_sve_struct_memory_operand_p): Declare. * config/aarch64/iterators.md (SVE_STRUCT): New mode iterator. (vector_count, insn_length, VSINGLE, vsingle): New mode attributes. (VPRED, vpred): Handle SVE structure modes. * config/aarch64/constraints.md (Utx): New constraint. * config/aarch64/predicates.md (aarch64_sve_struct_memory_operand) (aarch64_sve_struct_nonimmediate_operand): New predicates. * config/aarch64/aarch64.md (UNSPEC_LDN, UNSPEC_STN): New unspecs. * config/aarch64/aarch64-sve.md (mov<mode>, aarch64_sve_mov<mode>_le) (aarch64_sve_mov<mode>_be, pred_mov<mode>): New patterns for structure modes. Split into pieces after RA. (vec_load_lanes<mode><vsingle>, vec_mask_load_lanes<mode><vsingle>) (vec_store_lanes<mode><vsingle>, vec_mask_store_lanes<mode><vsingle>): New patterns. * config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle SVE structure modes. (aarch64_classify_address): Likewise. (sizetochar): Move earlier in file. (aarch64_print_operand): Handle SVE register lists. (aarch64_array_mode): New function. (aarch64_sve_struct_memory_operand_p): Likewise. (TARGET_ARRAY_MODE): Redefine. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_load_lanes): Return true for SVE too. * g++.dg/vect/pr36648.cc: XFAIL for variable-length vectors if load/store lanes are supported. * gcc.dg/vect/slp-10.c: Likewise. * gcc.dg/vect/slp-12c.c: Likewise. * gcc.dg/vect/slp-17.c: Likewise. * gcc.dg/vect/slp-33.c: Likewise. * gcc.dg/vect/slp-6.c: Likewise. * gcc.dg/vect/slp-cond-1.c: Likewise. * gcc.dg/vect/slp-multitypes-11-big-array.c: Likewise. * gcc.dg/vect/slp-multitypes-11.c: Likewise. * gcc.dg/vect/slp-multitypes-12.c: Likewise. * gcc.dg/vect/slp-perm-5.c: Remove XFAIL for variable-length SVE. * gcc.dg/vect/slp-perm-6.c: Likewise. * gcc.dg/vect/slp-perm-9.c: Likewise. * gcc.dg/vect/slp-reduc-6.c: Remove XFAIL for variable-length vectors. * gcc.dg/vect/vect-load-lanes-peeling-1.c: Expect an epilogue loop for variable-length vectors. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256618
2018-01-13	Give the target more control over ARRAY_TYPE modes	Richard Sandiford	8	-18/+81
	So far we've used integer modes for LD[234] and ST[234] arrays. That doesn't scale well to SVE, since the sizes aren't fixed at compile time (and even if they were, we wouldn't want integers to be so wide). This patch lets the target use double-, triple- and quadruple-length vectors instead. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.def (array_mode): New target hook. * doc/tm.texi.in (TARGET_ARRAY_MODE): New hook. * doc/tm.texi: Regenerate. * hooks.h (hook_optmode_mode_uhwi_none): Declare. * hooks.c (hook_optmode_mode_uhwi_none): New function. * tree-vect-data-refs.c (vect_lanes_optab_supported_p): Use targetm.array_mode. * stor-layout.c (mode_for_array): Likewise. Support polynomial type sizes. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256617
2018-01-13	Fix folding of vector mask EQ/NE expressions	Richard Sandiford	5	-1/+92
	fold_binary_loc assumed that if the type of the result wasn't a vector, the operands wouldn't be either. This isn't necessarily true for EQ_EXPR and NE_EXPR of vector masks, which can return a single scalar for the mask as a whole. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * fold-const.c (fold_binary_loc): Check the argument types rather than the result type when testing for a vector operation. gcc/testsuite/ * gcc.target/aarch64/sve/vec_bool_cmp_1.c: New test. * gcc.target/aarch64/sve/vec_bool_cmp_1_run.c: Likweise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256616
2018-01-13	SVE unwinding	Richard Sandiford	11	-0/+314
	This patch adds support for unwinding frames that use the SVE pseudo VG register. We want this register to act like a normal register if the CFI explicitly sets it, but want to provide a default value otherwise. Computing the default value requires an SVE target, so we only want to compute it on demand. aarch64_vg uses a hard-coded .inst in order to avoid a build dependency on binutils 2.28 or later. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/tm.texi.in (DWARF_LAZY_REGISTER_VALUE): Document. * doc/tm.texi: Regenerate. libgcc/ * config/aarch64/value-unwind.h (aarch64_vg): New function. (DWARF_LAZY_REGISTER_VALUE): Define. * unwind-dw2.c (_Unwind_GetGR): Use DWARF_LAZY_REGISTER_VALUE to provide a fallback register value. gcc/testsuite/ * g++.target/aarch64/sve/aarch64-sve.exp: New harness. * g++.target/aarch64/sve/catch_1.C: New test. * g++.target/aarch64/sve/catch_2.C: Likewise. * g++.target/aarch64/sve/catch_3.C: Likewise. * g++.target/aarch64/sve/catch_4.C: Likewise. * g++.target/aarch64/sve/catch_5.C: Likewise. * g++.target/aarch64/sve/catch_6.C: Likewise. Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> From-SVN: r256615
2018-01-13	[AArch64] SVE tests	Richard Sandiford	176	-9/+6942
	This patch adds gcc.target/aarch64 tests for SVE, and forces some existing Advanced SIMD tests to use -march=armv8-a. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/testsuite/ * lib/target-supports.exp (check_effective_target_aarch64_asm_sve_ok): New proc. * gcc.target/aarch64/bic_imm_1.c: Use #pragma GCC target "+nosve". * gcc.target/aarch64/fmaxmin.c: Likewise. * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. * gcc.target/aarch64/orr_imm_1.c: Likewise. * gcc.target/aarch64/pr62178.c: Likewise. * gcc.target/aarch64/pr71727-2.c: Likewise. * gcc.target/aarch64/saddw-1.c: Likewise. * gcc.target/aarch64/saddw-2.c: Likewise. * gcc.target/aarch64/uaddw-1.c: Likewise. * gcc.target/aarch64/uaddw-2.c: Likewise. * gcc.target/aarch64/uaddw-3.c: Likewise. * gcc.target/aarch64/vect-add-sub-cond.c: Likewise. * gcc.target/aarch64/vect-compile.c: Likewise. * gcc.target/aarch64/vect-faddv-compile.c: Likewise. * gcc.target/aarch64/vect-fcm-eq-d.c: Likewise. * gcc.target/aarch64/vect-fcm-eq-f.c: Likewise. * gcc.target/aarch64/vect-fcm-ge-d.c: Likewise. * gcc.target/aarch64/vect-fcm-ge-f.c: Likewise. * gcc.target/aarch64/vect-fcm-gt-d.c: Likewise. * gcc.target/aarch64/vect-fcm-gt-f.c: Likewise. * gcc.target/aarch64/vect-fmax-fmin-compile.c: Likewise. * gcc.target/aarch64/vect-fmaxv-fminv-compile.c: Likewise. * gcc.target/aarch64/vect-fmovd-zero.c: Likewise. * gcc.target/aarch64/vect-fmovd.c: Likewise. * gcc.target/aarch64/vect-fmovf-zero.c: Likewise. * gcc.target/aarch64/vect-fmovf.c: Likewise. * gcc.target/aarch64/vect-fp-compile.c: Likewise. * gcc.target/aarch64/vect-ld1r-compile-fp.c: Likewise. * gcc.target/aarch64/vect-ld1r-compile.c: Likewise. * gcc.target/aarch64/vect-movi.c: Likewise. * gcc.target/aarch64/vect-mull-compile.c: Likewise. * gcc.target/aarch64/vect-reduc-or_1.c: Likewise. * gcc.target/aarch64/vect-vaddv.c: Likewise. * gcc.target/aarch64/vect_saddl_1.c: Likewise. * gcc.target/aarch64/vect_smlal_1.c: Likewise. * gcc.target/aarch64/vector_initialization_nostack.c: XFAIL for fixed-length SVE. * gcc.target/aarch64/sve/aarch64-sve.exp: New file. * gcc.target/aarch64/sve/arith_1.c: New test. * gcc.target/aarch64/sve/const_pred_1.C: Likewise. * gcc.target/aarch64/sve/const_pred_2.C: Likewise. * gcc.target/aarch64/sve/const_pred_3.C: Likewise. * gcc.target/aarch64/sve/const_pred_4.C: Likewise. * gcc.target/aarch64/sve/cvtf_signed_1.c: Likewise. * gcc.target/aarch64/sve/cvtf_signed_1_run.c: Likewise. * gcc.target/aarch64/sve/cvtf_unsigned_1.c: Likewise. * gcc.target/aarch64/sve/cvtf_unsigned_1_run.c: Likewise. * gcc.target/aarch64/sve/dup_imm_1.c: Likewise. * gcc.target/aarch64/sve/dup_imm_1_run.c: Likewise. * gcc.target/aarch64/sve/dup_lane_1.c: Likewise. * gcc.target/aarch64/sve/ext_1.c: Likewise. * gcc.target/aarch64/sve/ext_2.c: Likewise. * gcc.target/aarch64/sve/extract_1.c: Likewise. * gcc.target/aarch64/sve/extract_2.c: Likewise. * gcc.target/aarch64/sve/extract_3.c: Likewise. * gcc.target/aarch64/sve/extract_4.c: Likewise. * gcc.target/aarch64/sve/fabs_1.c: Likewise. * gcc.target/aarch64/sve/fcvtz_signed_1.c: Likewise. * gcc.target/aarch64/sve/fcvtz_signed_1_run.c: Likewise. * gcc.target/aarch64/sve/fcvtz_unsigned_1.c: Likewise. * gcc.target/aarch64/sve/fcvtz_unsigned_1_run.c: Likewise. * gcc.target/aarch64/sve/fdiv_1.c: Likewise. * gcc.target/aarch64/sve/fdup_1.c: Likewise. * gcc.target/aarch64/sve/fdup_1_run.c: Likewise. * gcc.target/aarch64/sve/fmad_1.c: Likewise. * gcc.target/aarch64/sve/fmla_1.c: Likewise. * gcc.target/aarch64/sve/fmls_1.c: Likewise. * gcc.target/aarch64/sve/fmsb_1.c: Likewise. * gcc.target/aarch64/sve/fmul_1.c: Likewise. * gcc.target/aarch64/sve/fneg_1.c: Likewise. * gcc.target/aarch64/sve/fnmad_1.c: Likewise. * gcc.target/aarch64/sve/fnmla_1.c: Likewise. * gcc.target/aarch64/sve/fnmls_1.c: Likewise. * gcc.target/aarch64/sve/fnmsb_1.c: Likewise. * gcc.target/aarch64/sve/fp_arith_1.c: Likewise. * gcc.target/aarch64/sve/frinta_1.c: Likewise. * gcc.target/aarch64/sve/frinti_1.c: Likewise. * gcc.target/aarch64/sve/frintm_1.c: Likewise. * gcc.target/aarch64/sve/frintp_1.c: Likewise. * gcc.target/aarch64/sve/frintx_1.c: Likewise. * gcc.target/aarch64/sve/frintz_1.c: Likewise. * gcc.target/aarch64/sve/fsqrt_1.c: Likewise. * gcc.target/aarch64/sve/fsubr_1.c: Likewise. * gcc.target/aarch64/sve/index_1.c: Likewise. * gcc.target/aarch64/sve/index_1_run.c: Likewise. * gcc.target/aarch64/sve/ld1r_1.c: Likewise. * gcc.target/aarch64/sve/load_const_offset_1.c: Likewise. * gcc.target/aarch64/sve/load_const_offset_2.c: Likewise. * gcc.target/aarch64/sve/load_const_offset_3.c: Likewise. * gcc.target/aarch64/sve/load_scalar_offset_1.c: Likewise. * gcc.target/aarch64/sve/logical_1.c: Likewise. * gcc.target/aarch64/sve/loop_add_1.c: Likewise. * gcc.target/aarch64/sve/loop_add_1_run.c: Likewise. * gcc.target/aarch64/sve/mad_1.c: Likewise. * gcc.target/aarch64/sve/maxmin_1.c: Likewise. * gcc.target/aarch64/sve/maxmin_1_run.c: Likewise. * gcc.target/aarch64/sve/maxmin_strict_1.c: Likewise. * gcc.target/aarch64/sve/maxmin_strict_1_run.c: Likewise. * gcc.target/aarch64/sve/mla_1.c: Likewise. * gcc.target/aarch64/sve/mls_1.c: Likewise. * gcc.target/aarch64/sve/mov_rr_1.c: Likewise. * gcc.target/aarch64/sve/msb_1.c: Likewise. * gcc.target/aarch64/sve/mul_1.c: Likewise. * gcc.target/aarch64/sve/neg_1.c: Likewise. * gcc.target/aarch64/sve/nlogical_1.c: Likewise. * gcc.target/aarch64/sve/nlogical_1_run.c: Likewise. * gcc.target/aarch64/sve/pack_1.c: Likewise. * gcc.target/aarch64/sve/pack_1_run.c: Likewise. * gcc.target/aarch64/sve/pack_fcvt_signed_1.c: Likewise. * gcc.target/aarch64/sve/pack_fcvt_signed_1_run.c: Likewise. * gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c: Likewise. * gcc.target/aarch64/sve/pack_fcvt_unsigned_1_run.c: Likewise. * gcc.target/aarch64/sve/pack_float_1.c: Likewise. * gcc.target/aarch64/sve/pack_float_1_run.c: Likewise. * gcc.target/aarch64/sve/popcount_1.c: Likewise. * gcc.target/aarch64/sve/popcount_1_run.c: Likewise. * gcc.target/aarch64/sve/reduc_1.c: Likewise. * gcc.target/aarch64/sve/reduc_1_run.c: Likewise. * gcc.target/aarch64/sve/reduc_2.c: Likewise. * gcc.target/aarch64/sve/reduc_2_run.c: Likewise. * gcc.target/aarch64/sve/reduc_3.c: Likewise. * gcc.target/aarch64/sve/rev_1.c: Likewise. * gcc.target/aarch64/sve/revb_1.c: Likewise. * gcc.target/aarch64/sve/revh_1.c: Likewise. * gcc.target/aarch64/sve/revw_1.c: Likewise. * gcc.target/aarch64/sve/shift_1.c: Likewise. * gcc.target/aarch64/sve/single_1.c: Likewise. * gcc.target/aarch64/sve/single_2.c: Likewise. * gcc.target/aarch64/sve/single_3.c: Likewise. * gcc.target/aarch64/sve/single_4.c: Likewise. * gcc.target/aarch64/sve/spill_1.c: Likewise. * gcc.target/aarch64/sve/store_scalar_offset_1.c: Likewise. * gcc.target/aarch64/sve/subr_1.c: Likewise. * gcc.target/aarch64/sve/trn1_1.c: Likewise. * gcc.target/aarch64/sve/trn2_1.c: Likewise. * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Likewise. * gcc.target/aarch64/sve/unpack_fcvt_signed_1_run.c: Likewise. * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise. * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1_run.c: Likewise. * gcc.target/aarch64/sve/unpack_float_1.c: Likewise. * gcc.target/aarch64/sve/unpack_float_1_run.c: Likewise. * gcc.target/aarch64/sve/unpack_signed_1.c: Likewise. * gcc.target/aarch64/sve/unpack_signed_1_run.c: Likewise. * gcc.target/aarch64/sve/unpack_unsigned_1.c: Likewise. * gcc.target/aarch64/sve/unpack_unsigned_1_run.c: Likewise. * gcc.target/aarch64/sve/uzp1_1.c: Likewise. * gcc.target/aarch64/sve/uzp1_1_run.c: Likewise. * gcc.target/aarch64/sve/uzp2_1.c: Likewise. * gcc.target/aarch64/sve/uzp2_1_run.c: Likewise. * gcc.target/aarch64/sve/vcond_1.C: Likewise. * gcc.target/aarch64/sve/vcond_1_run.C: Likewise. * gcc.target/aarch64/sve/vcond_2.c: Likewise. * gcc.target/aarch64/sve/vcond_2_run.c: Likewise. * gcc.target/aarch64/sve/vcond_3.c: Likewise. * gcc.target/aarch64/sve/vcond_4.c: Likewise. * gcc.target/aarch64/sve/vcond_4_run.c: Likewise. * gcc.target/aarch64/sve/vcond_5.c: Likewise. * gcc.target/aarch64/sve/vcond_5_run.c: Likewise. * gcc.target/aarch64/sve/vcond_6.c: Likewise. * gcc.target/aarch64/sve/vcond_6_run.c: Likewise. * gcc.target/aarch64/sve/vec_init_1.c: Likewise. * gcc.target/aarch64/sve/vec_init_1_run.c: Likewise. * gcc.target/aarch64/sve/vec_init_2.c: Likewise. * gcc.target/aarch64/sve/vec_perm_1.c: Likewise. * gcc.target/aarch64/sve/vec_perm_1_run.c: Likewise. * gcc.target/aarch64/sve/vec_perm_1_overrange_run.c: Likewise. * gcc.target/aarch64/sve/vec_perm_const_1.c: Likewise. * gcc.target/aarch64/sve/vec_perm_const_1_overrun.c: Likewise. * gcc.target/aarch64/sve/vec_perm_const_1_run.c: Likewise. * gcc.target/aarch64/sve/vec_perm_const_single_1.c: Likewise. * gcc.target/aarch64/sve/vec_perm_const_single_1_run.c: Likewise. * gcc.target/aarch64/sve/vec_perm_single_1.c: Likewise. * gcc.target/aarch64/sve/vec_perm_single_1_run.c: Likewise. * gcc.target/aarch64/sve/zip1_1.c: Likewise. * gcc.target/aarch64/sve/zip2_1.c: Likewise. Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256614
2018-01-13	[AArch64] Testsuite markup for SVE	Richard Sandiford	15	-23/+172
	This patch adds new target selectors for SVE and updates existing selectors accordingly. It also XFAILs some tests that don't yet work for some SVE modes; most of these go away with follow-on vectorisation enhancements. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/testsuite/ * lib/target-supports.exp (check_effective_target_aarch64_sve) (aarch64_sve_bits, check_effective_target_aarch64_sve_hw) (aarch64_sve_hw_bits, check_effective_target_aarch64_sve256_hw): New procedures. (check_effective_target_vect_perm): Handle SVE. (check_effective_target_vect_perm_byte): Likewise. (check_effective_target_vect_perm_short): Likewise. (check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_element_align_preferred): Likewise. (check_effective_target_vect_align_stack_vars): Likewise. (check_effective_target_vect_load_lanes): Likewise. (check_effective_target_vect_masked_store): Likewise. (available_vector_sizes): Use aarch64_sve_bits for SVE. * gcc.dg/vect/tree-vect.h (VECTOR_BITS): Define appropriately for SVE. * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add SVE XFAIL. * gcc.dg/vect/bb-slp-pr69907.c: Likewise. * gcc.dg/vect/no-vfa-vect-depend-2.c: Likewise. * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise. * gcc.dg/vect/slp-23.c: Likewise. * gcc.dg/vect/slp-perm-5.c: Likewise. * gcc.dg/vect/slp-perm-6.c: Likewise. * gcc.dg/vect/slp-perm-9.c: Likewise. * gcc.dg/vect/slp-reduc-3.c: Likewise. * gcc.dg/vect/vect-114.c: Likewise. * gcc.dg/vect/vect-mult-const-pattern-1.c: Likewise. * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise. Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256613
2018-01-13	[AArch64] Add SVE support	Richard Sandiford	16	-349/+5363
	This patch adds support for ARM's Scalable Vector Extension. The patch just contains the core features that work with the current vectoriser framework; later patches will add extra capabilities to both the target-independent code and AArch64 code. The patch doesn't include: - support for unwinding frames whose size depends on the vector length - modelling the effect of __tls_get_addr on the SVE registers These are handled by later patches instead. Some notes: - The copyright years for aarch64-sve.md start at 2009 because some of the code is based on aarch64.md, which also starts from then. - The patch inserts spaces between items in the AArch64 section of sourcebuild.texi. This matches at least the surrounding architectures and looks a little nicer in the info output. - aarch64-sve.md includes a pattern: while_ult<GPI:mode><PRED_ALL:mode> A later patch adds a matching "while_ult" optab, but the pattern is also needed by the predicate vec_duplicate expander. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/invoke.texi (-msve-vector-bits=): Document new option. (sve): Document new AArch64 extension. * doc/md.texi (w): Extend the description of the AArch64 constraint to include SVE vectors. (Upl, Upa): Document new AArch64 predicate constraints. * config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New enum. * config/aarch64/aarch64.opt (sve_vector_bits): New enum. (msve-vector-bits=): New option. * config/aarch64/aarch64-option-extensions.def (fp, simd): Disable SVE when these are disabled. (sve): New extension. * config/aarch64/aarch64-modes.def: Define SVE vector and predicate modes. Adjust their number of units based on aarch64_sve_vg. (MAX_BITSIZE_MODE_ANY_MODE): Define. * config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New aarch64_addr_query_type. (aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode) (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p) (aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries) (aarch64_split_add_offset, aarch64_output_sve_cnt_immediate) (aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate) (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare. (aarch64_simd_imm_zero_p): Delete. (aarch64_check_zero_based_sve_index_immediate): Declare. (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p) (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p) (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p) (aarch64_sve_float_mul_immediate_p): Likewise. (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT rather than an rtx. (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare. (aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback. (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare. (aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float) (aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare. (aarch64_regmode_natural_size): Likewise. * config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro. (AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift left one place. (AARCH64_ISA_SVE, TARGET_SVE): New macros. (FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries for VG and the SVE predicate registers. (V_ALIASES): Add a "z"-prefixed alias. (FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1. (AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros. (PR_REGNUM_P, PR_LO_REGNUM_P): Likewise. (PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes. (REG_CLASS_NAMES): Add entries for them. (REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG and the predicate registers. (aarch64_sve_vg): Declare. (BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED) (SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros. (REGMODE_NATURAL_SIZE): Define. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle SVE macros. * config/aarch64/aarch64.c: Include cfgrtl.h. (simd_immediate_info): Add a constructor for series vectors, and an associated step field. (aarch64_sve_vg): New variable. (aarch64_dbx_register_number): Handle VG and the predicate registers. (aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete. (VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE) (VEC_ANY_DATA, VEC_STRUCT): New constants. (aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p) (aarch64_classify_vector_mode, aarch64_vector_data_mode_p) (aarch64_sve_data_mode_p, aarch64_sve_pred_mode) (aarch64_get_mask_mode): New functions. (aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS. (aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE predicate modes and predicate registers. Explicitly restrict GPRs to modes of 16 bytes or smaller. Only allow FP registers to store a vector mode if it is recognized by aarch64_classify_vector_mode. (aarch64_regmode_natural_size): New function. (aarch64_hard_regno_caller_save_mode): Return the original mode for predicates. (aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate) (aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl) (aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate) (aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New functions. (aarch64_add_offset): Add a temp2 parameter. Assert that temp1 does not overlap dest if the function is frame-related. Handle SVE constants. (aarch64_split_add_offset): New function. (aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass them aarch64_add_offset. (aarch64_allocate_and_probe_stack_space): Add a temp2 parameter and update call to aarch64_sub_sp. (aarch64_add_cfa_expression): New function. (aarch64_expand_prologue): Pass extra temporary registers to the functions above. Handle the case in which we need to emit new DW_CFA_expressions for registers that were originally saved relative to the stack pointer, but now have to be expressed relative to the frame pointer. (aarch64_output_mi_thunk): Pass extra temporary registers to the functions above. (aarch64_expand_epilogue): Likewise. Prevent inheritance of IP0 and IP1 values for SVE frames. (aarch64_expand_vec_series): New function. (aarch64_expand_sve_widened_duplicate): Likewise. (aarch64_expand_sve_const_vector): Likewise. (aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter. Handle SVE constants. Use emit_move_insn to move a force_const_mem into the register, rather than emitting a SET directly. (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move) (aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p) (offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p) (offset_9bit_signed_scaled_p): New functions. (aarch64_replicate_bitmask_imm): New function. (aarch64_bitmask_imm): Use it. (aarch64_cannot_force_const_mem): Reject expressions involving a CONST_POLY_INT. Update call to aarch64_classify_symbol. (aarch64_classify_index): Handle SVE indices, by requiring a plain register index with a scale that matches the element size. (aarch64_classify_address): Handle SVE addresses. Assert that the mode of the address is VOIDmode or an integer mode. Update call to aarch64_classify_symbol. (aarch64_classify_symbolic_expression): Update call to aarch64_classify_symbol. (aarch64_const_vec_all_in_range_p): New function. (aarch64_print_vector_float_operand): Likewise. (aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than "vN" for FP registers with SVE modes. Handle (const ...) vectors and the FP immediates 1.0 and 0.5. (aarch64_print_address_internal): Handle SVE addresses. (aarch64_print_operand_address): Use ADDR_QUERY_ANY. (aarch64_regno_regclass): Handle predicate registers. (aarch64_secondary_reload): Handle big-endian reloads of SVE data modes. (aarch64_class_max_nregs): Handle SVE modes and predicate registers. (aarch64_rtx_costs): Check for ADDVL and ADDPL instructions. (aarch64_convert_sve_vector_bits): New function. (aarch64_override_options): Use it to handle -msve-vector-bits=. (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT rather than an rtx. (aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode. Handle SVE vector and predicate modes. Accept VL-based constants that need only one temporary register, and VL offsets that require no temporary registers. (aarch64_conditional_register_usage): Mark the predicate registers as fixed if SVE isn't available. (aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode. Return true for SVE vector and predicate modes. (aarch64_simd_container_mode): Take the number of bits as a poly_int64 rather than an unsigned int. Handle SVE modes. (aarch64_preferred_simd_mode): Update call accordingly. Handle SVE modes. (aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR if SVE is enabled. (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p) (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p) (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p) (aarch64_sve_float_mul_immediate_p): New functions. (aarch64_sve_valid_immediate): New function. (aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors. Explicitly reject structure modes. Check for INDEX constants. Handle PTRUE and PFALSE constants. (aarch64_check_zero_based_sve_index_immediate): New function. (aarch64_simd_imm_zero_p): Delete. (aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for vector modes. Accept constants in the range of CNT[BHWD]. (aarch64_simd_scalar_immediate_valid_for_move): Explicitly ask for an Advanced SIMD mode. (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions. (aarch64_simd_vector_alignment): Handle SVE predicates. (aarch64_vectorize_preferred_vector_alignment): New function. (aarch64_simd_vector_alignment_reachable): Use it instead of the vector size. (aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p. (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New functions. (MAX_VECT_LEN): Delete. (expand_vec_perm_d): Add a vec_flags field. (emit_unspec2, aarch64_expand_sve_vec_perm): New functions. (aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip) (aarch64_evpc_ext): Don't apply a big-endian lane correction for SVE modes. (aarch64_evpc_rev): Rename to... (aarch64_evpc_rev_local): ...this. Use a predicated operation for SVE. (aarch64_evpc_rev_global): New function. (aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP. (aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of MAX_VECT_LEN. (aarch64_evpc_sve_tbl): New function. (aarch64_expand_vec_perm_const_1): Update after rename of aarch64_evpc_rev. Handle SVE permutes too, trying aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather than aarch64_evpc_tbl. (aarch64_vectorize_vec_perm_const): Initialize vec_flags. (aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code) (aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int) (aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or) (aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float) (aarch64_expand_sve_vcond): New functions. (aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead of aarch64_vector_mode_p. (aarch64_dwarf_poly_indeterminate_value): New function. (aarch64_compute_pressure_classes): Likewise. (aarch64_can_change_mode_class): Likewise. (TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine. (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise. (TARGET_VECTORIZE_GET_MASK_MODE): Likewise. (TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise. (TARGET_COMPUTE_PRESSURE_CLASSES): Likewise. (TARGET_CAN_CHANGE_MODE_CLASS): Likewise. * config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr) (Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New constraints. (Dn, Dl, Dr): Accept const as well as const_vector. (Dz): Likewise. Compare against CONST0_RTX. * config/aarch64/iterators.md: Refer to "Advanced SIMD" instead of "vector" where appropriate. (SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD) (SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators. (UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT) (UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE) (UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS) (UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs. (Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV) (v_int_equiv): Extend to SVE modes. (Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New mode attributes. (LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators. (optab): Handle popcount, smin, smax, umin, umax, abs and sqrt. (logical_nn, lr, sve_int_op, sve_fp_op): New code attributs. (LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP) (SVE_COND_FP_CMP): New int iterators. (perm_hilo): Handle the new unpack unspecs. (optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int attributes. * config/aarch64/predicates.md (aarch64_sve_cnt_immediate) (aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate) (aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand) (aarch64_equality_operator, aarch64_constant_vector_operand) (aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates. (aarch64_sve_nonimmediate_operand): Likewise. (aarch64_sve_general_operand): Likewise. (aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise. (aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate) (aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise. (aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise. (aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise. (aarch64_sve_float_arith_immediate): Likewise. (aarch64_sve_float_arith_with_sub_immediate): Likewise. (aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise. (aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise. (aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise. (aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise. (aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise. (aarch64_sve_float_arith_operand): Likewise. (aarch64_sve_float_arith_with_sub_operand): Likewise. (aarch64_sve_float_mul_operand): Likewise. (aarch64_sve_vec_perm_operand): Likewise. (aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate. (aarch64_mov_operand): Accept const_poly_int and const_vector. (aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const as well as const_vector. (aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier in file. Use CONST0_RTX and CONSTM1_RTX. (aarch64_simd_or_scalar_imm_zero): Likewise. Add match_codes. (aarch64_simd_reg_or_zero): Accept const as well as const_vector. Use aarch64_simd_imm_zero. * config/aarch64/aarch64-sve.md: New file. * config/aarch64/aarch64.md: Include it. (VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers. (UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE) (UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI) (UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK) (UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants. (sve): New attribute. (enabled): Disable instructions with the sve attribute unless TARGET_SVE. (movqi, movhi): Pass CONST_POLY_INT operaneds through aarch64_expand_mov_immediate. (mov<mode>_aarch64, movsi_aarch64, movdi_aarch64): Handle CNT[BHSD] immediates. (movti): Split CONST_POLY_INT moves into two halves. (add<mode>3): Accept aarch64_pluslong_or_poly_operand. Split additions that need a temporary here if the destination is the stack pointer. (add<mode>3_aarch64): Handle ADDVL and ADDPL immediates. (*add<mode>3_poly_1): New instruction. (set_clobber_cc): New expander. Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256612
2018-01-13	Mark SLP failures for vect_variable_length	Richard Sandiford	48	-47/+99
	Until SLP support for variable-length vectors is added, many tests fall back to non-SLP vectorisation with permutes. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> gcc/testsuite/ * gcc.dg/vect/no-scevccp-slp-30.c: XFAIL SLP test for vect_variable_length, expecting the test to be vectorized without SLP instead. * gcc.dg/vect/pr33953.c: Likewise. * gcc.dg/vect/pr37027.c: Likewise. * gcc.dg/vect/pr67790.c: Likewise. * gcc.dg/vect/pr68445.c: Likewise. * gcc.dg/vect/slp-1.c: Likewise. * gcc.dg/vect/slp-10.c: Likewise. * gcc.dg/vect/slp-12a.c: Likewise. * gcc.dg/vect/slp-12b.c: Likewise. * gcc.dg/vect/slp-12c.c: Likewise. * gcc.dg/vect/slp-13-big-array.c: Likewise. * gcc.dg/vect/slp-13.c: Likewise. * gcc.dg/vect/slp-14.c: Likewise. * gcc.dg/vect/slp-15.c: Likewise. * gcc.dg/vect/slp-17.c: Likewise. * gcc.dg/vect/slp-19b.c: Likewise. * gcc.dg/vect/slp-2.c: Likewise. * gcc.dg/vect/slp-20.c: Likewise. * gcc.dg/vect/slp-21.c: Likewise. * gcc.dg/vect/slp-22.c: Likewise. * gcc.dg/vect/slp-24-big-array.c: Likewise. * gcc.dg/vect/slp-24.c: Likewise. * gcc.dg/vect/slp-28.c: Likewise. * gcc.dg/vect/slp-39.c: Likewise. * gcc.dg/vect/slp-42.c: Likewise. * gcc.dg/vect/slp-6.c: Likewise. * gcc.dg/vect/slp-7.c: Likewise. * gcc.dg/vect/slp-cond-1.c: Likewise. * gcc.dg/vect/slp-cond-2-big-array.c: Likewise. * gcc.dg/vect/slp-cond-2.c: Likewise. * gcc.dg/vect/slp-multitypes-1.c: Likewise. * gcc.dg/vect/slp-multitypes-10.c: Likewise. * gcc.dg/vect/slp-multitypes-12.c: Likewise. * gcc.dg/vect/slp-multitypes-2.c: Likewise. * gcc.dg/vect/slp-multitypes-4.c: Likewise. * gcc.dg/vect/slp-multitypes-5.c: Likewise. * gcc.dg/vect/slp-multitypes-8.c: Likewise. * gcc.dg/vect/slp-multitypes-9.c: Likewise. * gcc.dg/vect/slp-reduc-1.c: Likewise. * gcc.dg/vect/slp-reduc-2.c: Likewise. * gcc.dg/vect/slp-reduc-4.c: Likewise. * gcc.dg/vect/slp-reduc-5.c: Likewise. * gcc.dg/vect/slp-reduc-7.c: Likewise. * gcc.dg/vect/slp-widen-mult-half.c: Likewise. * gcc.dg/vect/vect-live-slp-1.c: Likewise. * gcc.dg/vect/vect-live-slp-2.c: Likewise. * gcc.dg/vect/vect-live-slp-3.c: Likewise. From-SVN: r256611
2018-01-13	Extra subreg fold for variable-length CONST_VECTORs	Richard Sandiford	2	-11/+35
	The SVE support for the new CONST_VECTOR encoding needs to be able to extract the first N bits of the vector and duplicate it. This patch adds a simplify_subreg rule for that. The code is covered by the gcc.target/aarch64/sve_slp_.c tests. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> gcc/ simplify-rtx.c (simplify_immed_subreg): Add an inner_bytes parameter and use it instead of GET_MODE_SIZE (innermode). Use inner_bytes * BITS_PER_UNIT instead of GET_MODE_BITSIZE (innermode). Use CEIL (inner_bytes, GET_MODE_UNIT_SIZE (innermode)) instead of GET_MODE_NUNITS (innermode). Also add a first_elem parameter. Change innermode from fixed_mode_size to machine_mode. (simplify_subreg): Update call accordingly. Handle a constant-sized subreg of a variable-length CONST_VECTOR. From-SVN: r256610
2018-01-13	Improve canonicalisation of TARGET_MEM_REFs	Richard Sandiford	2	-12/+48
	A general TARGET_MEM_REF is: BASE + STEP * INDEX + INDEX2 + OFFSET After classifying the address in this way, the code that builds TARGET_MEM_REFs tries to simplify the address until it's valid for the current target and for the mode of memory being addressed. It does this in a fixed order: (1) add SYMBOL to BASE (2) add INDEX * STEP to the base, if STEP != 1 (3) add OFFSET to INDEX or BASE (reverted if unsuccessful) (4) add INDEX to BASE (5) add OFFSET to BASE So suppose we had an address: &symbol + offset + index * 8 (e.g. a[i + 1] for a global "a") on a target only allows an index or an offset, not both. Following the steps above, we'd first create: tmp = symbol tmp2 = tmp + index * 8 Then if the given offset value was valid for the mode being addressed, we'd create: MEM[base:tmp2, offset:offset] while if it was invalid we'd create: tmp3 = tmp2 + offset MEM[base:tmp3, offset:0] The problem is that this could happen if ivopts had decided to use a scaled index for an address that happens to have a constant base. The old procedure failed to give an indexed TARGET_MEM_REF in that case, and adding the offset last prevented later passes from being able to fold the index back in. The patch avoids this by checking at (2) whether the offset is the only component that causes the address to be invalid, folding it into the base if so. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-ssa-address.c (mem_ref_valid_without_offset_p): New function. (add_offset_to_base): New function, split out from... (create_mem_ref): ...here. When handling a scale other than 1, check first whether the address is valid without the offset. Add it into the base if so, leaving the index and scale as-is. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256609
2018-01-13	re PR c/83801 ([avr] String constant in __flash not put into .progmem)	Jakub Jelinek	6	-7/+60
	PR c/83801 * c-tree.h (decl_constant_value_1): Add a bool argument. * c-typeck.c (decl_constant_value_1): Add IN_INIT argument, allow returning a CONSTRUCTOR if it is true. Use error_operand_p. (decl_constant_value): Adjust caller. * c-fold.c (c_fully_fold_internal): If in_init, pass true to decl_constant_value_1 as IN_INIT. Otherwise, punt if decl_constant_value returns initializer that has BLKmode or array type. (c_fully_fold_internal) <case COMPONENT_REF>: Fold if !lval. * gcc.dg/pr83801.c: New test. From-SVN: r256608
2018-01-13	re PR fortran/52162 (Bogus -fcheck=bounds with realloc on assignment to ↵	Paul Thomas	5	-5/+45
	unallocated LHS) 2018-01-13 Paul Thomas <pault@gcc.gnu.org> PR fortran/52162 * trans-expr.c (gfc_trans_scalar_assign): Flag is_alloc_lhs if the rhs expression is neither an elemental nor a conversion function. PR fortran/83622 * trans-array.c (is_pointer_array): Remove unconditional return of false for -fopenmp. 2018-01-13 Paul Thomas <pault@gcc.gnu.org> PR fortran/52162 * gfortran.dg/bounds_check_19.f90 : New test. From-SVN: r256607
2018-01-13	re PR fortran/83803 (Using -fc-prototypes on modules with empty dummy arg ↵	Thomas Koenig	2	-2/+10
	lists does not close paren.) 2018-01-13 Thomas Koenig <tkoenig@gcc.gnu.org> <emsr@gcc.gnu.org> PR fortran/83803 * dump-parse-tree.c (write_proc): Always emit closing parenthesis for functions. From-SVN: r256606
2018-01-13	Daily bump.	GCC Administrator	1	-1/+1
	From-SVN: r256602
2018-01-12	re PR c++/83778 (g++.dg/ext/altivec-cell-2.C fails starting with r256448)	Jakub Jelinek	2	-1/+9
	PR c++/83778 * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Call fold_for_warn before checking if arg2 is INTEGER_CST. From-SVN: r256599
2018-01-12	rs6000: Remove -mstring	Segher Boessenkool	10	-775/+72
	-mstring is only enabled by default on 601, and with -Os on some configurations. It is almost always slower (than not using it) and does not very often lead to smaller code. This patch disables it. If a user uses -mstring he gets a warning (but not with -mno-string). I left the target attribute in place, it just doesn't do anything anymore. The patch also deletes a whole bunch of code. The 'N' and 'O' output modifiers are now unused, but now is not the time to delete them. * config/rs6000/predicates.md (load_multiple_operation): Delete. (store_multiple_operation): Delete. * config/rs6000/rs6000-cpus.def (601): Remove MASK_STRING. * config/rs6000/rs6000-protos.h (rs6000_output_load_multiple): Delete. * config/rs6000/rs6000-string.c (expand_block_move): Delete everything guarded by TARGET_STRING. (rs6000_output_load_multiple): Delete. * config/rs6000/rs6000.c (rs6000_option_override_internal): Delete OPTION_MASK_STRING / TARGET_STRING handling. (print_operand) <'N', 'O'>: Add comment that these are unused now. (const rs6000_opt_masks) <"string">: Change mask to 0. * config/rs6000/rs6000.h (TARGET_DEFAULT): Remove MASK_STRING. (MASK_STRING): Delete. * config/rs6000/rs6000.md (mov<mode>_string): Delete TARGET_STRING parts. Simplify. (load_multiple): Delete. (ldmsi8): Delete. (ldmsi7): Delete. (ldmsi6): Delete. (ldmsi5): Delete. (ldmsi4): Delete. (ldmsi3): Delete. (store_multiple): Delete. (stmsi8): Delete. (stmsi7): Delete. (stmsi6): Delete. (stmsi5): Delete. (stmsi4): Delete. (stmsi3): Delete. (movmemsi_8reg): Delete. (corresponding unnamed define_insn): Delete. (movmemsi_6reg): Delete. (corresponding unnamed define_insn): Delete. (movmemsi_4reg): Delete. (corresponding unnamed define_insn): Delete. (movmemsi_2reg): Delete. (corresponding unnamed define_insn): Delete. (movmemsi_1reg): Delete. (corresponding unnamed define_insn): Delete. config/rs6000/rs6000.opt (mno-string): New. (mstring): Replace by deprecation warning stub. * doc/invoke.texi (RS/6000 and PowerPC Options): Delete -mstring. From-SVN: r256598
2018-01-12	float128-hw7.c: Use scan-assembler-times instead of scan-assembler-not for ↵	Jakub Jelinek	2	-1/+6
	xsnabsqp. * gcc.target/powerpc/float128-hw7.c: Use scan-assembler-times instead of scan-assembler-not for xsnabsqp. From-SVN: r256597
2018-01-12	regrename.c (regrename_do_replace): If replacing the same reg multiple ↵	Jakub Jelinek	2	-7/+15
	times, try to reuse last created gen_raw_REG. * regrename.c (regrename_do_replace): If replacing the same reg multiple times, try to reuse last created gen_raw_REG. From-SVN: r256596
2018-01-12	re PR fortran/83525 (open(newunit=funit, status="scratch") fails if an ↵	Jerry DeLisle	2	-0/+25
	internal file (characters) was read previously.) 2018-01-12 Jerry DeLisle <jvdelisle@gcc.gnu.org> PR libgfortran/83525 * gfortran.dg/newunit_5.f90: New test. From-SVN: r256595
2018-01-12	PR c++/83186 - ICE with static_cast of list-initialized temporary.	Jason Merrill	3	-0/+19
	* typeck.c (build_static_cast): Use build_non_dependent_expr. From-SVN: r256594
2018-01-12	[C++ PATCH] some reformatting	Nathan Sidwell	4	-21/+29
	https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01107.html * cp-tree.h (mark_rvalue_use): Add parm name. * expr.c (mark_lvalue_use, mark_lvalue_use_nonread): Move next to mark_rvalue_use. * call.c (convert_like_real): Fix formatting. From-SVN: r256593
2018-01-12	re PR debug/81155 (Debug make check regressions in GCC 8.0)	Jakub Jelinek	2	-1/+10
	PR debug/81155 * bb-reorder.c (pass_partition_blocks::gate): In lto don't partition main to workaround a bug in GDB. From-SVN: r256592
2018-01-12	Set use_gcc_stdint=wrap for nvptx	Tom de Vries	2	-0/+6
	2018-01-12 Tom de Vries <tom@codesourcery.com> PR target/83737 * config.gcc (nvptx--*): Set use_gcc_stdint=wrap. From-SVN: r256591
2018-01-12	re PR rtl-optimization/80481 (Unoptimal additional copy instructions)	Vladimir Makarov	4	-1/+101
	2018-01-12 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/80481 * ira-color.c (get_cap_member): New function. (allocnos_conflict_by_live_ranges_p): Use it. (slot_coalesced_allocno_live_ranges_intersect_p): Add assert. (setup_slot_coalesced_allocno_live_ranges): Ditto. 2018-01-12 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/80481 * g++.dg/pr80481.C: New. From-SVN: r256590
2018-01-12	re PR rtl-optimization/83628 (performance regression when accessing arrays ↵	Uros Bizjak	5	-11/+119
	on alpha) PR target/83628 * config/alpha/alpha.md (saddsi_1): New insn_ans_split pattern. (saddl_se_1): Ditto. (ssubsi_1): Ditto. (saddl_se_1): Ditto. testsuite/ChangeLog: PR target/83628 * gcc.target/alpha/pr83628-3.c: New test. From-SVN: r256589
2018-01-12	Guard against incomplete AVX512F support in Solaris as	Rainer Orth	3	-12/+19
	* lib/target-supports.exp (check_effective_target_avx512f): Also check for __builtin_ia32_addsd_round, __builtin_ia32_getmantsd_round. * gcc.target/i386/i386.exp (check_effective_target_avx512f): Remove. From-SVN: r256588