aboutsummaryrefslogtreecommitdiff
path: root/gcc/optabs.def
AgeCommit message (Collapse)AuthorFilesLines
2020-11-19[2/3] [vect] Add widening add, subtract patternsJoel Hutton1-0/+8
Add widening add, subtract patterns to tree-vect-patterns. Update the widened code of patterns that detect PLUS_EXPR to also detect WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size S and perform an add/subtract on the elements, storing the results as N elements of size 2*S (in 2 result vectors). This is implemented in the aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64 tests for patterns. gcc/ChangeLog: * doc/generic.texi: Document new widen_plus/minus_lo/hi tree codes. * doc/md.texi: Document new widenening add/subtract hi/lo optabs. * expr.c (expand_expr_real_2): Add widen_add, widen_subtract cases. * optabs-tree.c (optab_for_tree_code): Add case for widening optabs. * optabs.def (OPTAB_D): Define vectorized widen add, subtracts. * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts. * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts. * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog pattern. (vect_recog_widen_sub_pattern): New recog pattern. (vect_recog_average_pattern): Update widened add code. (vect_recog_average_pattern): Update widened add code. * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract. (supportable_widening_operation): Add case for widened add, subtract. * tree.def (WIDEN_PLUS_EXPR): New tree code. (WIDEN_MINUS_EXPR): New tree code. (VEC_WIDEN_ADD_HI_EXPR): New tree code. (VEC_WIDEN_PLUS_LO_EXPR): New tree code. (VEC_WIDEN_MINUS_HI_EXPR): New tree code. (VEC_WIDEN_MINUS_LO_EXPR): New tree code. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-add.c: New test. * gcc.target/aarch64/vect-widen-sub.c: New test.
2020-07-08IFN/optabs: Support vector load/store with lengthKewen Lin1-0/+2
This patch is to add the internal function and optabs support for vector load/store with length. For the vector load/store with length optab, the length item would be measured in lanes by default. For the targets which support length measured in bytes like Power, they should only define VnQI modes to wrap the other same size vector modes. If the length is larger than total lane/byte count of the given mode, the behavior is undefined. For the remaining lanes/bytes which isn't specified by length, they would be taken as undefined value. gcc/ChangeLog: * doc/md.texi (len_load_@var{m}): Document. (len_store_@var{m}): Likewise. * internal-fn.c (len_load_direct): New macro. (len_store_direct): Likewise. (expand_len_load_optab_fn): Likewise. (expand_len_store_optab_fn): Likewise. (direct_len_load_optab_supported_p): Likewise. (direct_len_store_optab_supported_p): Likewise. (expand_mask_load_optab_fn): New macro. Original renamed to ... (expand_partial_load_optab_fn): ... here. Add handlings for len_load_optab. (expand_mask_store_optab_fn): New macro. Original renamed to ... (expand_partial_store_optab_fn): ... here. Add handlings for len_store_optab. (internal_load_fn_p): Handle IFN_LEN_LOAD. (internal_store_fn_p): Handle IFN_LEN_STORE. (internal_fn_stored_value_index): Handle IFN_LEN_STORE. * internal-fn.def (LEN_LOAD): New internal function. (LEN_STORE): Likewise. * optabs.def (len_load_optab, len_store_optab): New optab.
2020-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r279813
2019-11-18Add optabs for accelerating RAW and WAR alias checksRichard Sandiford1-0/+3
This patch adds optabs that check whether a read followed by a write or a write followed by a read can be divided into interleaved byte accesses without changing the dependencies between the bytes. This is one of the uses of the SVE2 WHILERW and WHILEWR instructions. (The instructions can also be used to limit the VF at runtime, but that's future work.) 2019-11-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ * doc/sourcebuild.texi (vect_check_ptrs): Document. * optabs.def (check_raw_ptrs_optab, check_war_ptrs_optab): New optabs. * doc/md.texi: Document them. * internal-fn.def (IFN_CHECK_RAW_PTRS, IFN_CHECK_WAR_PTRS): New internal functions. * internal-fn.h (internal_check_ptrs_fn_supported_p): Declare. * internal-fn.c (check_ptrs_direct): New macro. (expand_check_ptrs_optab_fn): Likewise. (direct_check_ptrs_optab_supported_p): Likewise. (internal_check_ptrs_fn_supported_p): New fuction. * tree-data-ref.c: Include internal-fn.h. (create_ifn_alias_checks): New function. (create_intersect_range_checks): Use it. * config/aarch64/iterators.md (SVE2_WHILE_PTR): New int iterator. (optab, cmp_op): Handle it. (raw_war, unspec): New int attributes. * config/aarch64/aarch64.md (UNSPEC_WHILERW, UNSPEC_WHILE_WR): New constants. * config/aarch64/predicates.md (aarch64_bytes_per_sve_vector_operand): New predicate. * config/aarch64/aarch64-sve2.md (check_<raw_war>_ptrs<mode>): New expander. (@aarch64_sve2_while<cmp_op><GPI:mode><PRED_ALL:mode>_ptest): New pattern. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_check_ptrs): New procedure. * gcc.dg/vect/vect-alias-check-14.c: Expect IFN_CHECK_WAR to be used, if available. * gcc.dg/vect/vect-alias-check-15.c: Likewise. * gcc.dg/vect/vect-alias-check-16.c: Likewise IFN_CHECK_RAW. * gcc.target/aarch64/sve2/whilerw_1.c: New test. * gcc.target/aarch64/sve2/whilewr_1.c: Likewise. * gcc.target/aarch64/sve2/whilewr_2.c: Likewise. From-SVN: r278414
2019-11-08Generalise gather and scatter optabsRichard Sandiford1-5/+4
The gather and scatter optabs required the vector offset to be the integer equivalent of the vector mode being loaded or stored. This patch generalises them so that the two vectors can have different element sizes, although they still need to have the same number of elements. One consequence of this is that it's possible (if unlikely) for two IFN_GATHER_LOADs to have the same arguments but different return types. E.g. the same scalar base and vector of 32-bit offsets could be used to load 8-bit elements and to load 16-bit elements. From just looking at the arguments, we could wrongly deduce that they're equivalent. I know we saw this happen at one point with IFN_WHILE_ULT, and we dealt with it there by passing a zero of the return type as an extra argument. Doing the same here also makes the load and store functions have the same argument assignment. For now this patch should be a no-op, but later SVE patches take advantage of the new flexibility. 2019-11-08 Richard Sandiford <richard.sandiford@arm.com> gcc/ * optabs.def (gather_load_optab, mask_gather_load_optab) (scatter_store_optab, mask_scatter_store_optab): Turn into conversion optabs, with the offset mode given explicitly. * doc/md.texi: Update accordingly. * config/aarch64/aarch64-sve-builtins-base.cc (svld1_gather_impl::expand): Likewise. (svst1_scatter_impl::expand): Likewise. * internal-fn.c (gather_load_direct, scatter_store_direct): Likewise. (expand_scatter_store_optab_fn): Likewise. (direct_gather_load_optab_supported_p): Likewise. (direct_scatter_store_optab_supported_p): Likewise. (expand_gather_load_optab_fn): Likewise. Expect the mask argument to be argument 4. (internal_fn_mask_index): Return 4 for IFN_MASK_GATHER_LOAD. (internal_gather_scatter_fn_supported_p): Replace the offset sign argument with the offset vector type. Require the two vector types to have the same number of elements but allow their element sizes to be different. Treat the optabs as conversion optabs. * internal-fn.h (internal_gather_scatter_fn_supported_p): Update prototype accordingly. * optabs-query.c (supports_at_least_one_mode_p): Replace with... (supports_vec_convert_optab_p): ...this new function. (supports_vec_gather_load_p): Update accordingly. (supports_vec_scatter_store_p): Likewise. * tree-vectorizer.h (vect_gather_scatter_fn_p): Take a vec_info. Replace the offset sign and bits parameters with a scalar type tree. * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise. Pass back the offset vector type instead of the scalar element type. Allow the offset to be wider than the memory elements. Search for an offset type that the target supports, stopping once we've reached the maximum of the element size and pointer size. Update call to internal_gather_scatter_fn_supported_p. (vect_check_gather_scatter): Update calls accordingly. When testing a new scale before knowing the final offset type, check whether the scale is supported for any signed or unsigned offset type. Check whether the target supports the source and target types of a conversion before deciding whether to look through the conversion. Record the chosen offset_vectype. * tree-vect-patterns.c (vect_get_gather_scatter_offset_type): Delete. (vect_recog_gather_scatter_pattern): Get the scalar offset type directly from the gs_info's offset_vectype instead. Pass a zero of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD. * tree-vect-stmts.c (check_load_store_masking): Update call to internal_gather_scatter_fn_supported_p, passing the offset vector type recorded in the gs_info. (vect_truncate_gather_scatter_offset): Update call to vect_check_gather_scatter, leaving it to search for a valid offset vector type. (vect_use_strided_gather_scatters_p): Convert the offset to the element type of the gs_info's offset_vectype. (vect_get_gather_scatter_ops): Get the offset vector type directly from the gs_info. (vect_get_strided_load_store_ops): Likewise. (vectorizable_load): Pass a zero of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD. * config/aarch64/aarch64-sve.md (gather_load<mode>): Rename to... (gather_load<mode><v_int_equiv>): ...this. (mask_gather_load<mode>): Rename to... (mask_gather_load<mode><v_int_equiv>): ...this. (scatter_store<mode>): Rename to... (scatter_store<mode><v_int_equiv>): ...this. (mask_scatter_store<mode>): Rename to... (mask_scatter_store<mode><v_int_equiv>): ...this. From-SVN: r277949
2019-09-30[AArch64][SVE] Utilize ASRD instruction for division and remainderYuliang Wang1-0/+1
2019-09-30 Yuliang Wang <yuliang.wang@arm.com> gcc/ * config/aarch64/aarch64-sve.md (sdiv_pow2<mode>3): New pattern for ASRD. * config/aarch64/iterators.md (UNSPEC_ASRD): New unspec. * internal-fn.def (IFN_DIV_POW2): New internal function. * optabs.def (sdiv_pow2_optab): New optab. * tree-vect-patterns.c (vect_recog_divmod_pattern): Modify pattern to support new operation. * doc/md.texi (sdiv_pow2$var{m3}): Documentation for the above. * doc/sourcebuild.texi (vect_sdiv_pow2_si): Document new target selector. gcc/testsuite/ * gcc.dg/vect/vect-sdiv-pow2-1.c: New test. * gcc.target/aarch64/sve/asrdiv_1.c: As above. * lib/target-supports.exp (check_effective_target_vect_sdiv_pow2_si): Return true for AArch64 with SVE. From-SVN: r276343
2019-09-12Vectorise multiply high with scaling operations (PR 89386)Yuliang Wang1-0/+4
2019-09-12 Yuliang Wang <yuliang.wang@arm.com> gcc/ PR tree-optimization/89386 * config/aarch64/aarch64-sve2.md (<su>mull<bt><Vwide>) (<r>shrnb<mode>, <r>shrnt<mode>): New SVE2 patterns. (<su>mulh<r>s<mode>3): New pattern for MULHRS. * config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT) (UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT) (UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS) UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs. (MULLBT, SHRNB, SHRNT, MULHRS): New int iterators. (su, r): Handle the unspecs above. (bt): New int attribute. * internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions. * internal-fn.c (first_commutative_argument): Commutativity info for above. * optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab) (umulhrs_optab): New optabs. * doc/md.texi (smulhs$var{m3}, umulhs$var{m3}) (smulhrs$var{m3}, umulhrs$var{m3}): Documentation for the above. * tree-vect-patterns.c (vect_recog_mulhs_pattern): New pattern function. (vect_vect_recog_func_ptrs): Add it. * testsuite/gcc.target/aarch64/sve2/mulhrs_1.c: New test. * testsuite/gcc.dg/vect/vect-mulhrs-1.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-2.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-3.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-4.c: As above. * doc/sourcebuild.texi (vect_mulhrs_hi): Document new target selector. * testsuite/lib/target-supports.exp (check_effective_target_vect_mulhrs_hi): Return true for AArch64 with SVE2. From-SVN: r275682
2019-08-26i386: Roundeven expansion for SSE4.1+Tejas Joshi1-0/+1
gcc/ChangeLog: 2019-08-26 Tejas Joshi <tejasjoshi9673@gmail.com> Uros Bizjak <ubizjak@gmail.com> * builtins.c (mathfn_built_in_2): Change CASE_MATHFN to CASE_MATHFN_FLOATN for roundeven. * config/i386/i386.c (ix86_i387_mode_needed): Add case I387_ROUNDEVEN. (ix86_mode_needed): Likewise. (ix86_mode_after): Likewise. (ix86_mode_entry): Likewise. (ix86_mode_exit): Likewise. (ix86_emit_mode_set): Likewise. (emit_i387_cw_initialization): Add case I387_CW_ROUNDEVEN. * config/i386/i386.h (ix86_stack_slot): Add SLOT_CW_ROUNDEVEN. (ix86_entry): Add I387_ROUNDEVEN. (avx_u128_state): Add I387_CW_ANY. * config/i386/i386.md: Define UNSPEC_FRNDINT_ROUNDEVEN. (define_int_iterator): Likewise. (define_int_attr): Likewise for rounding_insn, rounding and ROUNDING. (define_constant): Define ROUND_ROUNDEVEN mode. (define_attr): Add roundeven mode for i387_cw. (<rouding_insn><mode>2): Add condition for ROUND_ROUNDEVEN. * internal-fn.def (ROUNDEVEN): New builtin function. * optabs.def (roundeven_optab): New optab. gcc/testsuite/ChangeLog: 2019-08-26 Tejas Joshi <tejasjoshi9673@gmail.com> * gcc.target/i386/sse4_1-round-roundeven-1.c: New test. * gcc.target/i386/sse4_1-round-roundeven-2.c: New test. Co-Authored-By: Uros Bizjak <ubizjak@gmail.com> From-SVN: r274928
2019-08-15Add support for conditional shiftsRichard Sandiford1-0/+3
This patch adds support for IFN_COND shifts left and shifts right. This is mostly mechanical, but since we try to handle conditional operations in the same way as unconditional operations in match.pd, we need to support IFN_COND shifts by scalars as well as vectors. E.g.: IFN_COND_SHL (cond, a, { 1, 1, ... }, fallback) and: IFN_COND_SHL (cond, a, 1, fallback) are the same operation, with: (for shiftrotate (lrotate rrotate lshift rshift) ... /* Prefer vector1 << scalar to vector1 << vector2 if vector2 is uniform. */ (for vec (VECTOR_CST CONSTRUCTOR) (simplify (shiftrotate @0 vec@1) (with { tree tem = uniform_vector_p (@1); } (if (tem) (shiftrotate @0 { tem; })))))) preferring the latter. The patch copes with this by extending create_convert_operand_from to handle scalar-to-vector conversions. 2019-08-15 Richard Sandiford <richard.sandiford@arm.com> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> gcc/ * internal-fn.def (IFN_COND_SHL, IFN_COND_SHR): New internal functions. * internal-fn.c (FOR_EACH_CODE_MAPPING): Handle shifts. * match.pd (UNCOND_BINARY, COND_BINARY): Likewise. * optabs.def (cond_ashl_optab, cond_ashr_optab, cond_lshr_optab): New optabs. * optabs.h (create_convert_operand_from): Expand comment. * optabs.c (maybe_legitimize_operand): Allow implicit broadcasts when mapping scalar rtxes to vector operands. * config/aarch64/iterators.md (SVE_INT_BINARY): Add ashift, ashiftrt and lshiftrt. (sve_int_op, sve_int_op_rev, sve_pred_int_rhs2_operand): Handle them. * config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_const) (*cond_<optab><mode>_any_const): New patterns. gcc/testsuite/ * gcc.target/aarch64/sve/cond_shift_1.c: New test. * gcc.target/aarch64/sve/cond_shift_1_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_2.c: Likewise. * gcc.target/aarch64/sve/cond_shift_2_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_3.c: Likewise. * gcc.target/aarch64/sve/cond_shift_3_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_4.c: Likewise. * gcc.target/aarch64/sve/cond_shift_4_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_5.c: Likewise. * gcc.target/aarch64/sve/cond_shift_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_6.c: Likewise. * gcc.target/aarch64/sve/cond_shift_6_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_7.c: Likewise. * gcc.target/aarch64/sve/cond_shift_7_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_8.c: Likewise. * gcc.target/aarch64/sve/cond_shift_8_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_9.c: Likewise. * gcc.target/aarch64/sve/cond_shift_9_run.c: Likewise. Co-Authored-By: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> From-SVN: r274505
2019-07-02optabs.def (movmem_optab): Add movmem back for memmove().Aaron Sawdey1-0/+1
2019-07-02 Aaron Sawdey <acsawdey@linux.ibm.com> * optabs.def (movmem_optab): Add movmem back for memmove(). * doc/md.texi: Add description of movmem pattern for overlapping move. From-SVN: r272946
2019-06-27builtins.c (get_memory_rtx): Fix comment.Aaron Sawdey1-1/+1
2019-06-27 Aaron Sawdey <acsawdey@linux.ibm.com> * builtins.c (get_memory_rtx): Fix comment. * optabs.def (movmem_optab): Change to cpymem_optab. * expr.c (emit_block_move_via_cpymem): Change movmem to cpymem. (emit_block_move_hints): Change movmem to cpymem. * defaults.h: Change movmem to cpymem. * targhooks.c (get_move_ratio): Change movmem to cpymem. (default_use_by_pieces_infrastructure_p): Ditto. * config/aarch64/aarch64-protos.h: Change movmem to cpymem. * config/aarch64/aarch64.c (aarch64_expand_movmem): Change movmem to cpymem. * config/aarch64/aarch64.h: Change movmem to cpymem. * config/aarch64/aarch64.md (movmemdi): Change name to cpymemdi. * config/alpha/alpha.h: Change movmem to cpymem in comment. * config/alpha/alpha.md (movmemqi, movmemdi, *movmemdi_1): Change movmem to cpymem. * config/arc/arc-protos.h: Change movmem to cpymem. * config/arc/arc.c (arc_expand_movmem): Change movmem to cpymem. * config/arc/arc.h: Change movmem to cpymem in comment. * config/arc/arc.md (movmemsi): Change movmem to cpymem. * config/arm/arm-protos.h: Change movmem to cpymem in names. * config/arm/arm.c (arm_movmemqi_unaligned, arm_gen_movmemqi, gen_movmem_ldrd_strd, thumb_expand_movmemqi) Change movmem to cpymem. * config/arm/arm.md (movmemqi): Change movmem to cpymem. * config/arm/thumb1.md (movmem12b, movmem8b): Change movmem to cpymem. * config/avr/avr-protos.h: Change movmem to cpymem. * config/avr/avr.c (avr_adjust_insn_length, avr_emit_movmemhi, avr_out_movmem): Change movmem to cpymem. * config/avr/avr.md (movmemhi, movmem_<mode>, movmemx_<mode>): Change movmem to cpymem. * config/bfin/bfin-protos.h: Change movmem to cpymem. * config/bfin/bfin.c (single_move_for_movmem, bfin_expand_movmem): Change movmem to cpymem. * config/bfin/bfin.h: Change movmem to cpymem in comment. * config/bfin/bfin.md (movmemsi): Change name to cpymemsi. * config/c6x/c6x-protos.h: Change movmem to cpymem. * config/c6x/c6x.c (c6x_expand_movmem): Change movmem to cpymem. * config/c6x/c6x.md (movmemsi): Change name to cpymemsi. * config/frv/frv.md (movmemsi): Change name to cpymemsi. * config/ft32/ft32.md (movmemsi): Change name to cpymemsi. * config/h8300/h8300.md (movmemsi): Change name to cpymemsi. * config/i386/i386-expand.c (expand_set_or_movmem_via_loop, expand_set_or_movmem_via_rep, expand_movmem_epilogue, expand_setmem_epilogue_via_loop, expand_set_or_cpymem_prologue, expand_small_cpymem_or_setmem, expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves, expand_set_or_cpymem_constant_prologue, ix86_expand_set_or_cpymem): Change movmem to cpymem. * config/i386/i386-protos.h: Change movmem to cpymem. * config/i386/i386.h: Change movmem to cpymem in comment. * config/i386/i386.md (movmem<mode>): Change name to cpymem. (setmem<mode>): Change expansion function name. * config/lm32/lm32.md (movmemsi): Change name to cpymemsi. * config/m32c/blkmov.md (movmemhi, movmemhi_bhi_op, movmemhi_bpsi_op, movmemhi_whi_op, movmemhi_wpsi_op): Change movmem to cpymem. * config/m32c/m32c-protos.h: Change movmem to cpymem. * config/m32c/m32c.c (m32c_expand_movmemhi): Change movmem to cpymem. * config/m32r/m32r.c (m32r_expand_block_move): Change movmem to cpymem. * config/m32r/m32r.md (movmemsi, movmemsi_internal): Change movmem to cpymem. * config/mcore/mcore.md (movmemsi): Change name to cpymemsi. * config/microblaze/microblaze.c: Change movmem to cpymem in comment. * config/microblaze/microblaze.md (movmemsi): Change name to cpymemsi. * config/mips/mips.c (mips_use_by_pieces_infrastructure_p): Change movmem to cpymem. * config/mips/mips.h: Change movmem to cpymem. * config/mips/mips.md (movmemsi): Change name to cpymemsi. * config/nds32/nds32-memory-manipulation.c (nds32_expand_movmemsi_loop_unknown_size, nds32_expand_movmemsi_loop_known_size, nds32_expand_movmemsi_loop, nds32_expand_movmemsi_unroll, nds32_expand_movmemsi): Change movmem to cpymem. * config/nds32/nds32-multiple.md (movmemsi): Change name to cpymemsi. * config/nds32/nds32-protos.h: Change movmem to cpymem. * config/pa/pa.c (compute_movmem_length): Change movmem to cpymem. (pa_adjust_insn_length): Change call to compute_movmem_length. * config/pa/pa.md (movmemsi, movmemsi_prereload, movmemsi_postreload, movmemdi, movmemdi_prereload, movmemdi_postreload): Change movmem to cpymem. * config/pdp11/pdp11.md (movmemhi, movmemhi1, movmemhi_nocc, UNSPEC_MOVMEM): Change movmem to cpymem. * config/riscv/riscv.c: Change movmem to cpymem in comment. * config/riscv/riscv.h: Change movmem to cpymem. * config/riscv/riscv.md: (movmemsi) Change name to cpymemsi. * config/rs6000/rs6000.md: (movmemsi) Change name to cpymemsi. * config/rx/rx.md: (UNSPEC_MOVMEM, movmemsi, rx_movmem): Change movmem to cpymem. * config/s390/s390-protos.h: Change movmem to cpymem. * config/s390/s390.c (s390_expand_movmem, s390_expand_setmem, s390_expand_insv): Change movmem to cpymem. * config/s390/s390.md (movmem<mode>, movmem_short, *movmem_short, movmem_long, *movmem_long, *movmem_long_31z): Change movmem to cpymem. * config/sh/sh.md (movmemsi): Change name to cpymemsi. * config/sparc/sparc.h: Change movmem to cpymem in comment. * config/vax/vax-protos.h (vax_output_movmemsi): Remove prototype for nonexistent function. * config/vax/vax.h: Change movmem to cpymem in comment. * config/vax/vax.md (movmemhi, movmemhi1): Change movmem to cpymem. * config/visium/visium.h: Change movmem to cpymem in comment. * config/visium/visium.md (movmemsi): Change name to cpymemsi. * config/xtensa/xtensa.md (movmemsi): Change name to cpymemsi. * doc/md.texi: Change movmem to cpymem and update description to match. * doc/rtl.texi: Change movmem to cpymem. * target.def (use_by_pieces_infrastructure_p): Change movmem to cpymem. * doc/tm.texi: Regenerate. From-SVN: r272755
2019-06-19md.texi: Document vec_shl_<mode> pattern.Jakub Jelinek1-0/+1
* doc/md.texi: Document vec_shl_<mode> pattern. * optabs.def (vec_shl_optab): New optab. * optabs.c (shift_amt_for_vec_perm_mask): Add shift_optab argument, if == vec_shl_optab, check for left whole vector shift pattern rather than right shift. (expand_vec_perm_const): Add vec_shl_optab support. * optabs-query.c (can_vec_perm_var_p): Mention also vec_shl optab in the comment. * tree-vect-generic.c (lower_vec_perm): Support permutations which can be handled by vec_shl_optab. * tree-vect-stmts.c (scan_store_can_perm_p): New function. (check_scan_store): Use it. (vectorizable_scan_store): If target can't do normal permutations, try to use whole vector left shifts and if needed a VEC_COND_EXPR after it. * config/i386/sse.md (vec_shl_<mode>): New expander. * gcc.dg/vect/vect-simd-8.c: If main is defined, don't include tree-vect.h nor call check_vect. * gcc.dg/vect/vect-simd-9.c: Likewise. * gcc.dg/vect/vect-simd-10.c: New test. * gcc.target/i386/sse2-vect-simd-8.c: New test. * gcc.target/i386/sse2-vect-simd-9.c: New test. * gcc.target/i386/sse2-vect-simd-10.c: New test. * gcc.target/i386/avx2-vect-simd-8.c: New test. * gcc.target/i386/avx2-vect-simd-9.c: New test. * gcc.target/i386/avx2-vect-simd-10.c: New test. * gcc.target/i386/avx512f-vect-simd-8.c: New test. * gcc.target/i386/avx512f-vect-simd-9.c: New test. * gcc.target/i386/avx512f-vect-simd-10.c: New test. From-SVN: r272472
2019-06-18[Vectorizer] Support masking fold left reductionsAlejandro Martinez1-0/+1
This patch adds support in the vectorizer for masking fold left reductions. This avoids the need to insert a conditional assignement with some identity value. From-SVN: r272407
2019-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r267494
2018-12-21re PR target/88556 (Inline built-in sinh, cosh, tanh for -ffast-math)Uros Bizjak1-0/+3
PR target/88556 * internal-fn.def (COSH): New. (SINH): Ditto. (TANH): Ditto. * optabs.def (cosh_optab): New. (sinh_optab): Ditto. (tanh_optab): Ditto. * config/i386/i386-protos.h (ix86_emit_i387_sinh): New prototype. (ix86_emit_i387_cosh): Ditto. (ix86_emit_i387_tanh): Ditto. * config/i386/i386.c (ix86_emit_i387_sinh): New function. (ix86_emit_i387_cosh): Ditto. (ix86_emit_i387_tanh): Ditto. * config/i386/i386.md (sinhxf2): New expander. (sinh<mode>2): Ditto. (coshxf2): Ditto. (cosh<mode>2): Ditto. (tanhxf2): Ditto. (tanh<mode>2): Ditto. From-SVN: r267325
2018-12-18re PR target/88513 (FAIL: gcc.target/i386/pr59591-1.c)Jakub Jelinek1-0/+3
PR target/88513 PR target/88514 * optabs.def (vec_pack_sbool_trunc_optab, vec_unpacks_sbool_hi_optab, vec_unpacks_sbool_lo_optab): New optabs. * optabs.c (expand_widen_pattern_expr): Use vec_unpacks_sbool_*_optab and pass additional argument if both input and target have the same scalar mode of VECTOR_BOOLEAN_TYPE_P vectors. * expr.c (expand_expr_real_2) <case VEC_PACK_TRUNC_EXPR>: Handle VECTOR_BOOLEAN_TYPE_P pack where result has the same scalar mode as the operands using vec_pack_sbool_trunc_optab. * tree-vect-stmts.c (supportable_widening_operation): Use vec_unpacks_sbool_{lo,hi}_optab for VECTOR_BOOLEAN_TYPE_P conversions where both wider_vectype and vectype have the same scalar mode. (supportable_narrowing_operation): Similarly use vec_pack_sbool_trunc_optab if narrow_vectype and vectype have the same scalar mode. * config/i386/i386.c (ix86_get_builtin) <case IX86_BUILTIN_GATHER3ALTDIV8SF>: Check for VECTOR_MODE_P rather than non-VOIDmode. * config/i386/sse.md (vec_pack_trunc_qi, vec_pack_trunc_<mode>): Remove useless ()s around "register_operand", formatting fixes. (vec_pack_sbool_trunc_qi, vec_unpacks_sbool_lo_qi, vec_unpacks_sbool_hi_qi): New expanders. * doc/md.texi (vec_pack_sbool_trunc_M, vec_unpacks_sbool_hi_M, vec_unpacks_sbool_lo_M): Document. * gcc.target/i386/avx512f-pr88513-1.c: New test. * gcc.target/i386/avx512f-pr88513-2.c: New test. * gcc.target/i386/avx512vl-pr88464-1.c: New test. * gcc.target/i386/avx512vl-pr88464-2.c: New test. * gcc.target/i386/avx512vl-pr88464-3.c: New test. * gcc.target/i386/avx512vl-pr88464-4.c: New test. * gcc.target/i386/avx512vl-pr88513-1.c: New test. * gcc.target/i386/avx512vl-pr88513-2.c: New test. * gcc.target/i386/avx512vl-pr88513-3.c: New test. * gcc.target/i386/avx512vl-pr88513-4.c: New test. * gcc.target/i386/avx512vl-pr88514-1.c: New test. * gcc.target/i386/avx512vl-pr88514-2.c: New test. * gcc.target/i386/avx512vl-pr88514-3.c: New test. From-SVN: r267228
2018-12-17re PR target/88502 (Inline built-in asinh, acosh, atanh for -ffast-math)Uros Bizjak1-0/+3
PR target/88502 * internal-fn.def (ACOSH): New. (ASINH): Ditto. (ATANH): Ditto. * optabs.def (acosh_optab): New. (asinh_optab): Ditto. (atanh_optab): Ditto. * config/i386/i386-protos.h (ix86_emit_i387_asinh): New prototype. (ix86_emit_i387_acosh): Ditto. (ix86_emit_i387_atanh): Ditto. * config/i386/i386.c (ix86_emit_i387_asinh): New function. (ix86_emit_i387_acosh): Ditto. (ix86_emit_i387_atanh): Ditto. * config/i386/i386.md (asinhxf2): New expander. (asinh<mode>2): Ditto. (acoshxf2): Ditto. (acosh<mode>2): Ditto. (atanhxf2): Ditto. (atanh<mode>2): Ditto. From-SVN: r267204
2018-12-14re PR target/88474 (Inline built-in hypot for -ffast-math)Uros Bizjak1-0/+1
PR target/88474 * internal-fn.def (HYPOT): New. * optabs.def (hypot_optab): New. * config/i386/i386.md (hypot<mode>3): New expander. From-SVN: r267137
2018-07-12Add IFN_COND_FMA functionsRichard Sandiford1-0/+4
This patch adds conditional equivalents of the IFN_FMA built-in functions. Most of it is just a mechanical extension of the binary stuff. 2018-07-12 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/md.texi (cond_fma, cond_fms, cond_fnma, cond_fnms): Document. * optabs.def (cond_fma_optab, cond_fms_optab, cond_fnma_optab) (cond_fnms_optab): New optabs. * internal-fn.def (COND_FMA, COND_FMS, COND_FNMA, COND_FNMS): New internal functions. (FMA): Use DEF_INTERNAL_FLT_FN rather than DEF_INTERNAL_FLT_FLOATN_FN. * internal-fn.h (get_conditional_internal_fn): Declare. (get_unconditional_internal_fn): Likewise. * internal-fn.c (cond_ternary_direct): New macro. (expand_cond_ternary_optab_fn): Likewise. (direct_cond_ternary_optab_supported_p): Likewise. (FOR_EACH_COND_FN_PAIR): Likewise. (get_conditional_internal_fn): New function. (get_unconditional_internal_fn): Likewise. * gimple-match.h (gimple_match_op::MAX_NUM_OPS): Bump to 5. (gimple_match_op::gimple_match_op): Add a new overload for 5 operands. (gimple_match_op::set_op): Likewise. (gimple_resimplify5): Declare. * genmatch.c (decision_tree::gen): Generate simplifications for 5 operands. * gimple-match-head.c (gimple_simplify): Define an overload for 5 operands. Handle calls with 5 arguments in the top-level overload. (convert_conditional_op): Handle conversions from unconditional internal functions to conditional ones. (gimple_resimplify5): New function. (build_call_internal): Pass a fifth operand. (maybe_push_res_to_seq): Likewise. (try_conditional_simplification): Try converting conditional internal functions to unconditional internal functions. Handle 3-operand unconditional forms. * match.pd (UNCOND_TERNARY, COND_TERNARY): Operator lists. Define ternary equivalents of the current rules for binary conditional internal functions. * config/aarch64/aarch64.c (aarch64_preferred_else_value): Handle ternary operations. * config/aarch64/iterators.md (UNSPEC_COND_FMLA, UNSPEC_COND_FMLS) (UNSPEC_COND_FNMLA, UNSPEC_COND_FNMLS): New unspecs. (optab): Handle them. (SVE_COND_FP_TERNARY): New int iterator. (sve_fmla_op, sve_fmad_op): New int attributes. * config/aarch64/aarch64-sve.md (cond_<optab><mode>) (*cond_<optab><mode>_2, *cond_<optab><mode_4) (*cond_<optab><mode>_any): New SVE_COND_FP_TERNARY patterns. gcc/testsuite/ * gcc.dg/vect/vect-cond-arith-3.c: New test. * gcc.target/aarch64/sve/vcond_13.c: Likewise. * gcc.target/aarch64/sve/vcond_13_run.c: Likewise. * gcc.target/aarch64/sve/vcond_14.c: Likewise. * gcc.target/aarch64/sve/vcond_14_run.c: Likewise. * gcc.target/aarch64/sve/vcond_15.c: Likewise. * gcc.target/aarch64/sve/vcond_15_run.c: Likewise. * gcc.target/aarch64/sve/vcond_16.c: Likewise. * gcc.target/aarch64/sve/vcond_16_run.c: Likewise. From-SVN: r262587
2018-07-03[16/n] PR85694: Add detection of averaging operationsRichard Sandiford1-0/+4
This patch adds detection of average instructions: a = (((wide) b + (wide) c) >> 1); --> a = (wide) .AVG_FLOOR (b, c); a = (((wide) b + (wide) c + 1) >> 1); --> a = (wide) .AVG_CEIL (b, c); in cases where users of "a" need only the low half of the result, making the cast to (wide) redundant. The heavy lifting was done by earlier patches. This showed up another problem in vectorizable_call: if the call is a pattern definition statement rather than the main pattern statement, the type of vectorised call might be different from the type of the original statement. 2018-07-03 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/85694 * doc/md.texi (avgM3_floor, uavgM3_floor, avgM3_ceil) (uavgM3_ceil): Document new optabs. * doc/sourcebuild.texi (vect_avg_qi): Document new target selector. * internal-fn.def (IFN_AVG_FLOOR, IFN_AVG_CEIL): New internal functions. * optabs.def (savg_floor_optab, uavg_floor_optab, savg_ceil_optab) (savg_ceil_optab): New optabs. * tree-vect-patterns.c (vect_recog_average_pattern): New function. (vect_vect_recog_func_ptrs): Add it. * tree-vect-stmts.c (vectorizable_call): Get the type of the zero constant directly from the associated lhs. gcc/testsuite/ PR tree-optimization/85694 * lib/target-supports.exp (check_effective_target_vect_avg_qi): New proc. * gcc.dg/vect/vect-avg-1.c: New test. * gcc.dg/vect/vect-avg-2.c: Likewise. * gcc.dg/vect/vect-avg-3.c: Likewise. * gcc.dg/vect/vect-avg-4.c: Likewise. * gcc.dg/vect/vect-avg-5.c: Likewise. * gcc.dg/vect/vect-avg-6.c: Likewise. * gcc.dg/vect/vect-avg-7.c: Likewise. * gcc.dg/vect/vect-avg-8.c: Likewise. * gcc.dg/vect/vect-avg-9.c: Likewise. * gcc.dg/vect/vect-avg-10.c: Likewise. * gcc.dg/vect/vect-avg-11.c: Likewise. * gcc.dg/vect/vect-avg-12.c: Likewise. * gcc.dg/vect/vect-avg-13.c: Likewise. * gcc.dg/vect/vect-avg-14.c: Likewise. From-SVN: r262335
2018-05-29re PR target/85918 (Conversions to/from [unsigned] long long are not ↵Jakub Jelinek1-0/+6
vectorized for AVX512DQ target) PR target/85918 * tree.def (VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR, VEC_PACK_FLOAT_EXPR): New tree codes. * tree-pretty-print.c (op_code_prio): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR and VEC_UNPACK_FIX_TRUNC_LO_EXPR. (dump_generic_node): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR and VEC_PACK_FLOAT_EXPR. * tree-inline.c (estimate_operator_cost): Likewise. * gimple-pretty-print.c (dump_binary_rhs): Handle VEC_PACK_FLOAT_EXPR. * fold-const.c (const_binop): Likewise. (const_unop): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR and VEC_UNPACK_FIX_TRUNC_LO_EXPR. * tree-cfg.c (verify_gimple_assign_unary): Likewise. (verify_gimple_assign_binary): Handle VEC_PACK_FLOAT_EXPR. * cfgexpand.c (expand_debug_expr): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR and VEC_PACK_FLOAT_EXPR. * expr.c (expand_expr_real_2): Likewise. * optabs.def (vec_packs_float_optab, vec_packu_float_optab, vec_unpack_sfix_trunc_hi_optab, vec_unpack_sfix_trunc_lo_optab, vec_unpack_ufix_trunc_hi_optab, vec_unpack_ufix_trunc_lo_optab): New optabs. * optabs.c (expand_widen_pattern_expr): For VEC_UNPACK_FIX_TRUNC_HI_EXPR and VEC_UNPACK_FIX_TRUNC_LO_EXPR use sign from result type rather than operand's type. (expand_binop_directly): For vec_packu_float_optab and vec_packs_float_optab allow result type to be different from operand's type. * optabs-tree.c (optab_for_tree_code): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR and VEC_PACK_FLOAT_EXPR. Formatting fixes. * tree-vect-generic.c (expand_vector_operations_1): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR and VEC_PACK_FLOAT_EXPR. * tree-vect-stmts.c (supportable_widening_operation): Handle FIX_TRUNC_EXPR. (supportable_narrowing_operation): Handle FLOAT_EXPR. * config/i386/i386.md (fixprefix, floatprefix): New code attributes. * config/i386/sse.md (*float<floatunssuffix>v2div2sf2): Rename to ... (float<floatunssuffix>v2div2sf2): ... this. Formatting fix. (vpckfloat_concat_mode, vpckfloat_temp_mode, vpckfloat_op_mode): New mode attributes. (vec_pack<floatprefix>_float_<mode>): New expander. (vunpckfixt_mode, vunpckfixt_model, vunpckfixt_extract_mode): New mode attributes. (vec_unpack_<fixprefix>fix_trunc_lo_<mode>, vec_unpack_<fixprefix>fix_trunc_hi_<mode>): New expanders. * doc/md.texi (vec_packs_float_@var{m}, vec_packu_float_@var{m}, vec_unpack_sfix_trunc_hi_@var{m}, vec_unpack_sfix_trunc_lo_@var{m}, vec_unpack_ufix_trunc_hi_@var{m}, vec_unpack_ufix_trunc_lo_@var{m}): Document. * doc/generic.texi (VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR): Fix pasto in description. (VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR, VEC_PACK_FLOAT_EXPR): Document. * gcc.target/i386/avx512dq-pr85918.c: Add -mprefer-vector-width=512 and -fno-vect-cost-model options. Add aligned(64) attribute to the arrays. Add suffix 1 to all functions and use 4 iterations rather than N. Add functions with conversions to and from float. Add new set of functions with 8 iterations and another one with 16 iterations, expect 24 vectorized loops instead of just 4. * gcc.target/i386/avx512dq-pr85918-2.c: New test. From-SVN: r260893
2018-05-25Add IFN_COND_{MUL,DIV,MOD,RDIV}Richard Sandiford1-0/+5
This patch adds support for conditional multiplication and division. It's mostly mechanical, but a few notes: * The *_optab name and the .md names are the same as the unconditional forms, just with "cond_" added to the front. This means we still have the awkward difference between sdiv and div, etc. * It was easier to retain the difference between integer and FP division in the function names, given that they map to different tree codes (TRUNC_DIV_EXPR and RDIV_EXPR). * SVE has no direct support for IFN_COND_MOD, but it seemed more consistent to add it anyway. * Adding IFN_COND_MUL enables an extra fully-masked reduction in gcc.dg/vect/pr53773.c. * In practice we don't actually use the integer division forms without if-conversion support (added by a later patch). 2018-05-25 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/sourcebuild.texi (vect_double_cond_arith): Include multiplication and division. * doc/md.texi (cond_mul@var{m}, cond_div@var{m}, cond_mod@var{m}) (cond_udiv@var{m}, cond_umod@var{m}): Document. * optabs.def (cond_smul_optab, cond_sdiv_optab, cond_smod_optab) (cond_udiv_optab, cond_umod_optab): New optabs. * internal-fn.def (IFN_COND_MUL, IFN_COND_DIV, IFN_COND_MOD) (IFN_COND_RDIV): New internal functions. * internal-fn.c (get_conditional_internal_fn): Handle TRUNC_DIV_EXPR, TRUNC_MOD_EXPR and RDIV_EXPR. * match.pd (UNCOND_BINARY, COND_BINARY): Handle them. * config/aarch64/iterators.md (UNSPEC_COND_MUL, UNSPEC_COND_DIV): New unspecs. (SVE_INT_BINARY): Include mult. (SVE_COND_FP_BINARY): Include UNSPEC_MUL and UNSPEC_DIV. (optab, sve_int_op): Handle mult. (optab, sve_fp_op, commutative): Handle UNSPEC_COND_MUL and UNSPEC_COND_DIV. * config/aarch64/aarch64-sve.md (cond_<optab><mode>): New pattern for SVE_INT_BINARY_SD. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_double_cond_arith): Include multiplication and division. * gcc.dg/vect/pr53773.c: Do not expect a scalar tail when using fully-masked loops with a fixed vector length. * gcc.dg/vect/vect-cond-arith-1.c: Add multiplication and division tests. * gcc.target/aarch64/sve/vcond_8.c: Likewise. * gcc.target/aarch64/sve/vcond_9.c: Likewise. * gcc.target/aarch64/sve/vcond_12.c: Add multiplication tests. From-SVN: r260713
2018-01-13Add support for SVE scatter storesRichard Sandiford1-0/+2
This is mostly a mechanical extension of the previous gather load support to scatter stores. The internal functions in this case are: IFN_SCATTER_STORE (base, offsets, scale, values) IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask) However, one nonobvious change is to vect_analyze_data_ref_access. If we're treating an access as a gather load or scatter store (i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code would create a dummy data_reference whose step is 0. There's not really much else it could do, since the whole point is that the step isn't predictable from iteration to iteration. We then went into this code in vect_analyze_data_ref_access: /* Allow loads with zero step in inner-loop vectorization. */ if (loop_vinfo && integer_zerop (step)) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; if (!nested_in_vect_loop_p (loop, stmt)) return DR_IS_READ (dr); I.e. we'd take the step literally and assume that this is a load or store to an invariant address. Loads from invariant addresses are supported but stores to them aren't. The code therefore had the effect of disabling all scatter stores. AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c test for the correctness of a scatter-like loop, they don't seem to check whether a scatter instruction is actually used. The patch therefore makes vect_analyze_data_ref_access return true for scatters. We do seem to handle the aliasing correctly; that's tested by other functions, and is symmetrical to the already-working gather case. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/sourcebuild.texi (vect_scatter_store): Document. * optabs.def (scatter_store_optab, mask_scatter_store_optab): New optabs. * doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}): Document. * genopinit.c (main): Add supports_vec_scatter_store and supports_vec_scatter_store_cached to target_optabs. * gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. * internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal functions. * internal-fn.h (internal_store_fn_p): Declare. (internal_fn_stored_value_index): Likewise. * internal-fn.c (scatter_store_direct): New macro. (expand_scatter_store_optab_fn): New function. (direct_scatter_store_optab_supported_p): New macro. (internal_store_fn_p): New function. (internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. (internal_fn_mask_index): Likewise. (internal_fn_stored_value_index): New function. (internal_gather_scatter_fn_supported_p): Adjust operand numbers for scatter stores. * optabs-query.h (supports_vec_scatter_store_p): Declare. * optabs-query.c (supports_vec_scatter_store_p): New function. * tree-vectorizer.h (vect_get_store_rhs): Declare. * tree-vect-data-refs.c (vect_analyze_data_ref_access): Return true for scatter stores. (vect_gather_scatter_fn_p): Handle scatter stores too. (vect_check_gather_scatter): Consider using scatter stores if supports_vec_scatter_store_p. * tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle scatter stores too. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_stored_value_index. (check_load_store_masking): Handle scatter stores too. (vect_get_store_rhs): Make public. (vectorizable_call): Use internal_store_fn_p. (vectorizable_store): Handle scatter store internal functions. (vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE when deciding whether the end of the group has been reached. * config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec. * config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander. (mask_scatter_store<mode>): New insns. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_scatter_store): New proc. * gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on targets with scatter stores. * gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter stores. * gcc.target/aarch64/sve/mask_scatter_store_1.c: New test. * gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise. * gcc.target/aarch64/sve/scatter_store_1.c: Likewise. * gcc.target/aarch64/sve/scatter_store_2.c: Likewise. * gcc.target/aarch64/sve/scatter_store_3.c: Likewise. * gcc.target/aarch64/sve/scatter_store_4.c: Likewise. * gcc.target/aarch64/sve/scatter_store_5.c: Likewise. * gcc.target/aarch64/sve/scatter_store_6.c: Likewise. * gcc.target/aarch64/sve/scatter_store_7.c: Likewise. * gcc.target/aarch64/sve/strided_store_1.c: Likewise. * gcc.target/aarch64/sve/strided_store_2.c: Likewise. * gcc.target/aarch64/sve/strided_store_3.c: Likewise. * gcc.target/aarch64/sve/strided_store_4.c: Likewise. * gcc.target/aarch64/sve/strided_store_5.c: Likewise. * gcc.target/aarch64/sve/strided_store_6.c: Likewise. * gcc.target/aarch64/sve/strided_store_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256643
2018-01-13Add support for SVE gather loadsRichard Sandiford1-0/+3
This patch adds support for SVE gather loads. It uses the basically the same analysis code as the AVX gather support, but after that there are two major differences: - It uses new internal functions rather than target built-ins. The interface is: IFN_GATHER_LOAD (base, offsets scale) IFN_MASK_GATHER_LOAD (base, offsets scale, mask) which should be reasonably generic. One of the advantages of using internal functions is that other passes can understand what the functions do, but a more immediate advantage is that we can query the underlying target pattern to see which scales it supports. - It uses pattern recognition to convert the offset to the right width, if it was originally narrower than that. This avoids having to do a widening operation as part of the gather expansion itself. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (gather_load@var{m}): Document. (mask_gather_load@var{m}): Likewise. * genopinit.c (main): Add supports_vec_gather_load and supports_vec_gather_load_cached to target_optabs. * optabs-tree.c (init_tree_optimization_optabs): Use ggc_cleared_alloc to allocate target_optabs. * optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs. * internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal functions. * internal-fn.h (internal_load_fn_p): Declare. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * internal-fn.c (gather_load_direct): New macro. (expand_gather_load_optab_fn): New function. (direct_gather_load_optab_supported_p): New macro. (direct_internal_fn_optab): New function. (internal_load_fn_p): Likewise. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * optabs-query.c (supports_at_least_one_mode_p): New function. (supports_vec_gather_load_p): Likewise. * optabs-query.h (supports_vec_gather_load_p): Declare. * tree-vectorizer.h (gather_scatter_info): Add ifn, element_type and memory_type field. (NUM_PATTERNS): Bump to 15. * tree-vect-data-refs.c: Include internal-fn.h. (vect_gather_scatter_fn_p): New function. (vect_describe_gather_scatter_call): Likewise. (vect_check_gather_scatter): Try using internal functions for gather loads. Recognize existing calls to a gather load function. (vect_analyze_data_refs): Consider using gather loads if supports_vec_gather_load_p. * tree-vect-patterns.c (vect_get_load_store_mask): New function. (vect_get_gather_scatter_offset_type): Likewise. (vect_convert_mask_for_vectype): Likewise. (vect_add_conversion_to_patterm): Likewise. (vect_try_gather_scatter_pattern): Likewise. (vect_recog_gather_scatter_pattern): New pattern recognizer. (vect_vect_recog_func_ptrs): Add it. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_mask_index and internal_gather_scatter_fn_p. (check_load_store_masking): Take the gather_scatter_info as an argument and handle gather loads. (vect_get_gather_scatter_ops): New function. (vectorizable_call): Check internal_load_fn_p. (vectorizable_load): Likewise. Handle gather load internal functions. (vectorizable_store): Update call to check_load_store_masking. * config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec. * config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators. * config/aarch64/predicates.md (aarch64_gather_scale_operand_w) (aarch64_gather_scale_operand_d): New predicates. * config/aarch64/aarch64-sve.md (gather_load<mode>): New expander. (mask_gather_load<mode>): New insns. gcc/testsuite/ * gcc.target/aarch64/sve/gather_load_1.c: New test. * gcc.target/aarch64/sve/gather_load_2.c: Likewise. * gcc.target/aarch64/sve/gather_load_3.c: Likewise. * gcc.target/aarch64/sve/gather_load_4.c: Likewise. * gcc.target/aarch64/sve/gather_load_5.c: Likewise. * gcc.target/aarch64/sve/gather_load_6.c: Likewise. * gcc.target/aarch64/sve/gather_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256640
2018-01-13Add support for in-order addition reduction using SVE FADDARichard Sandiford1-0/+1
This patch adds support for in-order floating-point addition reductions, which are suitable even in strict IEEE mode. Previously vect_is_simple_reduction would reject any cases that forbid reassociation. The idea is instead to tentatively accept them as "FOLD_LEFT_REDUCTIONs" and only fail later if there is no support for them. Although this patch only handles the particular case of plus and minus on floating-point types, there's no reason in principle why we couldn't handle other cases. The reductions use a new fold_left_plus_optab if available, otherwise they fall back to elementwise additions or subtractions. The vect_force_simple_reduction change makes it easier for parloops to read the type of reduction. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (fold_left_plus_optab): New optab. * doc/md.texi (fold_left_plus_@var{m}): Document. * internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function. * internal-fn.c (fold_left_direct): Define. (expand_fold_left_optab_fn): Likewise. (direct_fold_left_optab_supported_p): Likewise. * fold-const-call.c (fold_const_fold_left): New function. (fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS. * tree-parloops.c (valid_reduction_p): New function. (gather_scalar_reductions): Use it. * tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type. (vect_finish_replace_stmt): Declare. * tree-vect-loop.c (fold_left_reduction_fn): New function. (needs_fold_left_reduction_p): New function, split out from... (vect_is_simple_reduction): ...here. Accept reductions that forbid reassociation, but give them type FOLD_LEFT_REDUCTION. (vect_force_simple_reduction): Also store the reduction type in the assignment's STMT_VINFO_REDUC_TYPE. (vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION. (merge_with_identity): New function. (vect_expand_fold_left): Likewise. (vectorize_fold_left_reduction): Likewise. (vectorizable_reduction): Handle FOLD_LEFT_REDUCTION. Leave the scalar phi in place for it. Check for target support and reject cases that would reassociate the operation. Defer the transform phase to vectorize_fold_left_reduction. * config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec. * config/aarch64/aarch64-sve.md (fold_left_plus_<mode>): New expander. (*fold_left_plus_<mode>, *pred_fold_left_plus_<mode>): New insns. gcc/testsuite/ * gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and check for a message about using in-order reductions. * gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and check for a message about using in-order reductions. * gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be vectorized and check for a message about using in-order reductions. Expect targets with variable-length vectors to fall back to the fixed-length mininum. * gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and check for a message about using in-order reductions. * gcc.dg/vect/vect-reduc-in-order-1.c: New test. * gcc.dg/vect/vect-reduc-in-order-2.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-3.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-4.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_1.c: New test. * gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_2.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_3.c: Likewise. * gcc.target/aarch64/sve/slp_13.c: Add floating-point types. * gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if vect_fold_left_plus. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256639
2018-01-13Add support for conditional reductions using SVE CLASTBRichard Sandiford1-0/+1
This patch uses SVE CLASTB to optimise conditional reductions. It means that we no longer need to maintain a separate index vector to record the most recent valid value, and no longer need to worry about overflow cases. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (fold_extract_last_@var{m}): Document. * doc/sourcebuild.texi (vect_fold_extract_last): Likewise. * optabs.def (fold_extract_last_optab): New optab. * internal-fn.def (FOLD_EXTRACT_LAST): New internal function. * internal-fn.c (fold_extract_direct): New macro. (expand_fold_extract_optab_fn): Likewise. (direct_fold_extract_optab_supported_p): Likewise. * tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type. * tree-vect-loop.c (vect_model_reduction_cost): Handle EXTRACT_LAST_REDUCTION. (get_initial_def_for_reduction): Do not create an initial vector for EXTRACT_LAST_REDUCTION reductions. (vectorizable_reduction): Leave the scalar phi in place for EXTRACT_LAST_REDUCTIONs. Try using EXTRACT_LAST_REDUCTION ahead of INTEGER_INDUC_COND_REDUCTION. Do not check for an epilogue code for EXTRACT_LAST_REDUCTION and defer the transform phase to vectorizable_condition. * tree-vect-stmts.c (vect_finish_stmt_generation_1): New function, split out from... (vect_finish_stmt_generation): ...here. (vect_finish_replace_stmt): New function. (vectorizable_condition): Handle EXTRACT_LAST_REDUCTION. * config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New pattern. * config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_fold_extract_last): New proc. * gcc.dg/vect/pr65947-1.c: Update dump messages. Add markup for fold_extract_last. * gcc.dg/vect/pr65947-2.c: Likewise. * gcc.dg/vect/pr65947-3.c: Likewise. * gcc.dg/vect/pr65947-4.c: Likewise. * gcc.dg/vect/pr65947-5.c: Likewise. * gcc.dg/vect/pr65947-6.c: Likewise. * gcc.dg/vect/pr65947-9.c: Likewise. * gcc.dg/vect/pr65947-10.c: Likewise. * gcc.dg/vect/pr65947-12.c: Likewise. * gcc.dg/vect/pr65947-14.c: Likewise. * gcc.dg/vect/pr80631-1.c: Likewise. * gcc.target/aarch64/sve/clastb_1.c: New test. * gcc.target/aarch64/sve/clastb_1_run.c: Likewise. * gcc.target/aarch64/sve/clastb_2.c: Likewise. * gcc.target/aarch64/sve/clastb_2_run.c: Likewise. * gcc.target/aarch64/sve/clastb_3.c: Likewise. * gcc.target/aarch64/sve/clastb_3_run.c: Likewise. * gcc.target/aarch64/sve/clastb_4.c: Likewise. * gcc.target/aarch64/sve/clastb_4_run.c: Likewise. * gcc.target/aarch64/sve/clastb_5.c: Likewise. * gcc.target/aarch64/sve/clastb_5_run.c: Likewise. * gcc.target/aarch64/sve/clastb_6.c: Likewise. * gcc.target/aarch64/sve/clastb_6_run.c: Likewise. * gcc.target/aarch64/sve/clastb_7.c: Likewise. * gcc.target/aarch64/sve/clastb_7_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256633
2018-01-13Add support for vectorising live-out values using SVE LASTBRichard Sandiford1-0/+2
This patch uses the SVE LASTB instruction to optimise cases in which a value produced by the final scalar iteration of a vectorised loop is live outside the loop. Previously this situation would stop us from using a fully-masked loop. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (extract_last_@var{m}): Document. * optabs.def (extract_last_optab): New optab. * internal-fn.def (EXTRACT_LAST): New internal function. * internal-fn.c (cond_unary_direct): New macro. (expand_cond_unary_optab_fn): Likewise. (direct_cond_unary_optab_supported_p): Likewise. * tree-vect-loop.c (vectorizable_live_operation): Allow fully-masked loops using EXTRACT_LAST. * config/aarch64/aarch64-sve.md (aarch64_sve_lastb<mode>): Rename to... (extract_last_<mode>): ...this optab. (vec_extract<mode><Vel>): Update accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/live_1.c: New test. * gcc.target/aarch64/sve/live_1_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256632
2018-01-13Add support for reductions in fully-masked loopsRichard Sandiford1-0/+9
This patch removes the restriction that fully-masked loops cannot have reductions. The key thing here is to make sure that the reduction accumulator doesn't include any values associated with inactive lanes; the patch adds a bunch of conditional binary operations for doing that. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (cond_add@var{mode}, cond_sub@var{mode}) (cond_and@var{mode}, cond_ior@var{mode}, cond_xor@var{mode}) (cond_smin@var{mode}, cond_smax@var{mode}, cond_umin@var{mode}) (cond_umax@var{mode}): Document. * optabs.def (cond_add_optab, cond_sub_optab, cond_and_optab) (cond_ior_optab, cond_xor_optab, cond_smin_optab, cond_smax_optab) (cond_umin_optab, cond_umax_optab): New optabs. * internal-fn.def (COND_ADD, COND_SUB, COND_MIN, COND_MAX, COND_AND) (COND_IOR, COND_XOR): New internal functions. * internal-fn.h (get_conditional_internal_fn): Declare. * internal-fn.c (cond_binary_direct): New macro. (expand_cond_binary_optab_fn): Likewise. (direct_cond_binary_optab_supported_p): Likewise. (get_conditional_internal_fn): New function. * tree-vect-loop.c (vectorizable_reduction): Handle fully-masked loops. Cope with reduction statements that are vectorized as calls rather than assignments. * config/aarch64/aarch64-sve.md (cond_<optab><mode>): New insns. * config/aarch64/iterators.md (UNSPEC_COND_ADD, UNSPEC_COND_SUB) (UNSPEC_COND_SMAX, UNSPEC_COND_UMAX, UNSPEC_COND_SMIN) (UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR) (UNSPEC_COND_EOR): New unspecs. (optab): Add mappings for them. (SVE_COND_INT_OP, SVE_COND_FP_OP): New int iterators. (sve_int_op, sve_fp_op): New int attributes. gcc/testsuite/ * gcc.dg/vect/pr60482.c: Remove XFAIL for variable-length vectors. * gcc.target/aarch64/sve/reduc_1.c: Expect the loop operations to be predicated. * gcc.target/aarch64/sve/slp_5.c: Check for a fully-masked loop. * gcc.target/aarch64/sve/slp_7.c: Likewise. * gcc.target/aarch64/sve/reduc_5.c: New test. * gcc.target/aarch64/sve/slp_13.c: Likewise. * gcc.target/aarch64/sve/slp_13_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256626
2018-01-13Add support for fully-predicated loopsRichard Sandiford1-0/+2
This patch adds support for using a single fully-predicated loop instead of a vector loop and a scalar tail. An SVE WHILELO instruction generates the predicate for each iteration of the loop, given the current scalar iv value and the loop bound. This operation is wrapped up in a new internal function called WHILE_ULT. E.g.: WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 } WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 } The third WHILE_ULT argument is needed to make the operation unambiguous: without it, WHILE_ULT (0, 3) for one vector type would seem equivalent to WHILE_ULT (0, 3) for another, even if the types have different numbers of elements. Note that the patch uses "mask" and "fully-masked" instead of "predicate" and "fully-predicated", to follow existing GCC terminology. This patch just handles the simple cases, punting for things like reductions and live-out values. Later patches remove most of these restrictions. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (while_ult_optab): New optab. * doc/md.texi (while_ult@var{m}@var{n}): Document. * internal-fn.def (WHILE_ULT): New internal function. * internal-fn.h (direct_internal_fn_supported_p): New override that takes two types as argument. * internal-fn.c (while_direct): New macro. (expand_while_optab_fn): New function. (convert_optab_supported_p): Likewise. (direct_while_optab_supported_p): New macro. * wide-int.h (wi::udiv_ceil): New function. * tree-vectorizer.h (rgroup_masks): New structure. (vec_loop_masks): New typedef. (_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p and fully_masked_p. (LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P) (LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros. (vect_max_vf): New function. (slpeel_make_loop_iterate_ntimes): Delete. (vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while) (vect_halve_mask_nunits, vect_double_mask_nunits): Declare. (vect_record_loop_mask, vect_get_loop_mask): Likewise. * tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h, internal-fn.h, stor-layout.h and optabs-query.h. (vect_set_loop_mask): New function. (add_preheader_seq): Likewise. (add_header_seq): Likewise. (interleave_supported_p): Likewise. (vect_maybe_permute_loop_masks): Likewise. (vect_set_loop_masks_directly): Likewise. (vect_set_loop_condition_masked): Likewise. (vect_set_loop_condition_unmasked): New function, split out from slpeel_make_loop_iterate_ntimes. (slpeel_make_loop_iterate_ntimes): Rename to.. (vect_set_loop_condition): ...this. Use vect_set_loop_condition_masked for fully-masked loops and vect_set_loop_condition_unmasked otherwise. (vect_do_peeling): Update call accordingly. (vect_gen_vector_loop_niters): Use VF as the step for fully-masked loops. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize mask_compare_type, can_fully_mask_p and fully_masked_p. (release_vec_loop_masks): New function. (_loop_vec_info): Use it to free the loop masks. (can_produce_all_loop_masks_p): New function. (vect_get_max_nscalars_per_iter): Likewise. (vect_verify_full_masking): Likewise. (vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around retries, and free the mask rgroups before retrying. Check loop-wide reasons for disallowing fully-masked loops. Make the final decision about whether use a fully-masked loop or not. (vect_estimate_min_profitable_iters): Do not assume that peeling for the number of iterations will be needed for fully-masked loops. (vectorizable_reduction): Disable fully-masked loops. (vectorizable_live_operation): Likewise. (vect_halve_mask_nunits): New function. (vect_double_mask_nunits): Likewise. (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. (vect_transform_loop): Handle the case in which the final loop iteration might handle a partial vector. Call vect_set_loop_condition instead of slpeel_make_loop_iterate_ntimes. * tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h. (check_load_store_masking): New function. (prepare_load_store_mask): Likewise. (vectorizable_store): Handle fully-masked loops. (vectorizable_load): Likewise. (supportable_widening_operation): Use vect_halve_mask_nunits for booleans. (supportable_narrowing_operation): Likewise vect_double_mask_nunits. (vect_gen_while): New function. * config/aarch64/aarch64.md (umax<mode>3): New expander. (aarch64_uqdec<mode>): New insn. gcc/testsuite/ * gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization. * gcc.dg/tree-ssa/peel1.c: Likewise. * gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for variable-length vectors. * gcc.target/aarch64/sve/vcond_6.c: XFAIL test for AND. * gcc.target/aarch64/sve/vec_bool_cmp_1.c: Expect BIC instead of NOT. * gcc.target/aarch64/sve/slp_1.c: Check for a fully-masked loop. * gcc.target/aarch64/sve/slp_2.c: Likewise. * gcc.target/aarch64/sve/slp_3.c: Likewise. * gcc.target/aarch64/sve/slp_4.c: Likewise. * gcc.target/aarch64/sve/slp_6.c: Likewise. * gcc.target/aarch64/sve/slp_8.c: New test. * gcc.target/aarch64/sve/slp_8_run.c: Likewise. * gcc.target/aarch64/sve/slp_9.c: Likewise. * gcc.target/aarch64/sve/slp_9_run.c: Likewise. * gcc.target/aarch64/sve/slp_10.c: Likewise. * gcc.target/aarch64/sve/slp_10_run.c: Likewise. * gcc.target/aarch64/sve/slp_11.c: Likewise. * gcc.target/aarch64/sve/slp_11_run.c: Likewise. * gcc.target/aarch64/sve/slp_12.c: Likewise. * gcc.target/aarch64/sve/slp_12_run.c: Likewise. * gcc.target/aarch64/sve/ld1r_2.c: Likewise. * gcc.target/aarch64/sve/ld1r_2_run.c: Likewise. * gcc.target/aarch64/sve/while_1.c: Likewise. * gcc.target/aarch64/sve/while_2.c: Likewise. * gcc.target/aarch64/sve/while_3.c: Likewise. * gcc.target/aarch64/sve/while_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256625
2018-01-13Add support for bitwise reductionsRichard Sandiford1-0/+3
This patch adds support for the SVE bitwise reduction instructions (ANDV, ORV and EORV). It's a fairly mechanical extension of existing REDUC_* operators. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) (reduc_xor_scal_optab): New optabs. * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) (reduc_xor_scal_@var{m}): Document. * doc/sourcebuild.texi (vect_logical_reduc): Likewise. * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New internal functions. * fold-const-call.c (fold_const_call): Handle them. * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): (*reduc_<bit_reduc>_scal_<mode>): New patterns. * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) (UNSPEC_XORV): New unspecs. (optab): Add entries for them. (BITWISEV): New int iterator. (bit_reduc_op): New int attributes. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_logical_reduc): New proc. * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc and add an associated scan-dump test. Prevent vectorization of the first two loops. * gcc.dg/vect/vect-reduc-or_2.c: Likewise. * gcc.target/aarch64/sve/reduc_1.c: Add AND, IOR and XOR reductions. * gcc.target/aarch64/sve/reduc_2.c: Likewise. * gcc.target/aarch64/sve/reduc_1_run.c: Likewise. (INIT_VECTOR): Tweak initial value so that some bits are always set. * gcc.target/aarch64/sve/reduc_2_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256624
2018-01-13SLP reductions with variable-length vectorsRichard Sandiford1-0/+1
Two things stopped us using SLP reductions with variable-length vectors: (1) We didn't have a way of constructing the initial vector. This patch does it by creating a vector full of the neutral identity value and then using a shift-and-insert function to insert any non-identity inputs into the low-numbered elements. (The non-identity values are needed for double reductions.) Alternatively, for unchained MIN/MAX reductions that have no neutral value, we instead use the same duplicate-and-interleave approach as for SLP constant and external definitions (added by a previous patch). (2) The epilogue for constant-length vectors would extract the vector elements associated with each SLP statement and do scalar arithmetic on these individual elements. For variable-length vectors, the patch instead creates a reduction vector for each SLP statement, replacing the elements for other SLP statements with the identity value. It then uses a hardware reduction instruction on each vector. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (vec_shl_insert_@var{m}): New optab. * internal-fn.def (VEC_SHL_INSERT): New internal function. * optabs.def (vec_shl_insert_optab): New optab. * tree-vectorizer.h (can_duplicate_and_interleave_p): Declare. (duplicate_and_interleave): Likewise. * tree-vect-loop.c: Include internal-fn.h. (neutral_op_for_slp_reduction): New function, split out from get_initial_defs_for_reduction. (get_initial_def_for_reduction): Handle option 2 for variable-length vectors by loading the neutral value into a vector and then shifting the initial value into element 0. (get_initial_defs_for_reduction): Replace the code argument with the neutral value calculated by neutral_op_for_slp_reduction. Use gimple_build_vector for constant-length vectors. Use IFN_VEC_SHL_INSERT for variable-length vectors if all but the first group_size elements have a neutral value. Use duplicate_and_interleave otherwise. (vect_create_epilog_for_reduction): Take a neutral_op parameter. Update call to get_initial_defs_for_reduction. Handle SLP reductions for variable-length vectors by creating one vector result for each scalar result, with the elements associated with other scalar results stubbed out with the neutral value. (vectorizable_reduction): Call neutral_op_for_slp_reduction. Require IFN_VEC_SHL_INSERT for double reductions on variable-length vectors, or SLP reductions that have a neutral value. Require can_duplicate_and_interleave_p support for variable-length unchained SLP reductions if there is no neutral value, such as for MIN/MAX reductions. Also require the number of vector elements to be a multiple of the number of SLP statements when doing variable-length unchained SLP reductions. Update call to vect_create_epilog_for_reduction. * tree-vect-slp.c (can_duplicate_and_interleave_p): Make public and remove initial values. (duplicate_and_interleave): Make public. * config/aarch64/aarch64.md (UNSPEC_INSR): New unspec. * config/aarch64/aarch64-sve.md (vec_shl_insert_<mode>): New insn. gcc/testsuite/ * gcc.dg/vect/pr37027.c: Remove XFAIL for variable-length vectors. * gcc.dg/vect/pr67790.c: Likewise. * gcc.dg/vect/slp-reduc-1.c: Likewise. * gcc.dg/vect/slp-reduc-2.c: Likewise. * gcc.dg/vect/slp-reduc-3.c: Likewise. * gcc.dg/vect/slp-reduc-5.c: Likewise. * gcc.target/aarch64/sve/slp_5.c: New test. * gcc.target/aarch64/sve/slp_5_run.c: Likewise. * gcc.target/aarch64/sve/slp_6.c: Likewise. * gcc.target/aarch64/sve/slp_6_run.c: Likewise. * gcc.target/aarch64/sve/slp_7.c: Likewise. * gcc.target/aarch64/sve/slp_7_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256623
2018-01-13Add support for masked load/store_lanesRichard Sandiford1-0/+2
This patch adds support for vectorising groups of IFN_MASK_LOADs and IFN_MASK_STOREs using conditional load/store-lanes instructions. This requires new internal functions to represent the result (IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs. The normal IFN_{LOAD,STORE}_LANES functions are const operations that logically just perform the permute: the load or store is encoded as a MEM operand to the call statement. In contrast, the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of interface as IFN_MASK_{LOAD,STORE}, since the memory is only conditionally accessed. The AArch64 patterns were added as part of the main LD[234]/ST[234] patch. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document. (vec_mask_store_lanes@var{m}@var{n}): Likewise. * optabs.def (vec_mask_load_lanes_optab): New optab. (vec_mask_store_lanes_optab): Likewise. * internal-fn.def (MASK_LOAD_LANES): New internal function. (MASK_STORE_LANES): Likewise. * internal-fn.c (mask_load_lanes_direct): New macro. (mask_store_lanes_direct): Likewise. (expand_mask_load_optab_fn): Handle masked operations. (expand_mask_load_lanes_optab_fn): New macro. (expand_mask_store_optab_fn): Handle masked operations. (expand_mask_store_lanes_optab_fn): New macro. (direct_mask_load_lanes_optab_supported_p): Likewise. (direct_mask_store_lanes_optab_supported_p): Likewise. * tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-data-refs.c (strip_conversion): New function. (can_group_stmts_p): Likewise. (vect_analyze_data_ref_accesses): Use it instead of checking for a pair of assignments. (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Update calls to vect_store_lanes_supported and vect_load_lanes_supported. * tree-vect-slp.c (vect_analyze_slp_instance): Likewise. * tree-vect-stmts.c (get_group_load_store_type): Take a masked_p parameter. Don't allow gaps for masked accesses. Use vect_get_store_rhs. Update calls to vect_store_lanes_supported and vect_load_lanes_supported. (get_load_store_type): Take a masked_p parameter and update call to get_group_load_store_type. (vectorizable_store): Update call to get_load_store_type. Handle IFN_MASK_STORE_LANES. (vectorizable_load): Update call to get_load_store_type. Handle IFN_MASK_LOAD_LANES. gcc/testsuite/ * gcc.dg/vect/vect-ooo-group-1.c: New test. * gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_2_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_3_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256620
2018-01-03Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r256169
2018-01-02Remove vec_perm_const optabRichard Sandiford1-1/+0
One of the changes needed for variable-length VEC_PERM_EXPRs -- and for long fixed-length VEC_PERM_EXPRs -- is the ability to use constant selectors that wouldn't fit in the vectors being permuted. E.g. a permute on two V256QIs can't be done using a V256QI selector. At the moment constant permutes use two interfaces: targetm.vectorizer.vec_perm_const_ok for testing whether a permute is valid and the vec_perm_const optab for actually emitting the permute. The former gets passed a vec<> selector and the latter an rtx selector. Most ports share a lot of code between the hook and the optab, with a wrapper function for each interface. We could try to keep that interface and require ports to define wider vector modes that could be attached to the CONST_VECTOR (e.g. V256HI or V256SI in the example above). But building a CONST_VECTOR rtx seems a bit pointless here, since the expand code only creates the CONST_VECTOR in order to call the optab, and the first thing the target does is take the CONST_VECTOR apart again. The easiest approach therefore seemed to be to remove the optab and reuse the target hook to emit the code. One potential drawback is that it's no longer possible to use match_operand predicates to force operands into the required form, but in practice all targets want register operands anyway. The patch also changes vec_perm_indices into a class that provides some simple routines for handling permutations. A later patch will flesh this out and get rid of auto_vec_perm_indices, but I didn't want to do all that in this patch and make it more complicated than it already is. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * Makefile.in (OBJS): Add vec-perm-indices.o. * vec-perm-indices.h: New file. * vec-perm-indices.c: Likewise. * target.h (vec_perm_indices): Replace with a forward class declaration. (auto_vec_perm_indices): Move to vec-perm-indices.h. * optabs.h: Include vec-perm-indices.h. (expand_vec_perm): Delete. (selector_fits_mode_p, expand_vec_perm_var): Declare. (expand_vec_perm_const): Declare. * target.def (vec_perm_const_ok): Replace with... (vec_perm_const): ...this new hook. * doc/tm.texi.in (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Replace with... (TARGET_VECTORIZE_VEC_PERM_CONST): ...this new hook. * doc/tm.texi: Regenerate. * optabs.def (vec_perm_const): Delete. * doc/md.texi (vec_perm_const): Likewise. (vec_perm): Refer to TARGET_VECTORIZE_VEC_PERM_CONST. * expr.c (expand_expr_real_2): Use expand_vec_perm_const rather than expand_vec_perm for constant permutation vectors. Assert that the mode of variable permutation vectors is the integer equivalent of the mode that is being permuted. * optabs-query.h (selector_fits_mode_p): Declare. * optabs-query.c: Include vec-perm-indices.h. (selector_fits_mode_p): New function. (can_vec_perm_const_p): Check whether targetm.vectorize.vec_perm_const is defined, instead of checking whether the vec_perm_const_optab exists. Use targetm.vectorize.vec_perm_const instead of targetm.vectorize.vec_perm_const_ok. Check whether the indices fit in the vector mode before using a variable permute. * optabs.c (shift_amt_for_vec_perm_mask): Take a mode and a vec_perm_indices instead of an rtx. (expand_vec_perm): Replace with... (expand_vec_perm_const): ...this new function. Take the selector as a vec_perm_indices rather than an rtx. Also take the mode of the selector. Update call to shift_amt_for_vec_perm_mask. Use targetm.vectorize.vec_perm_const instead of vec_perm_const_optab. Use vec_perm_indices::new_expanded_vector to expand the original selector into bytes. Check whether the indices fit in the vector mode before using a variable permute. (expand_vec_perm_var): Make global. (expand_mult_highpart): Use expand_vec_perm_const. * fold-const.c: Includes vec-perm-indices.h. * tree-ssa-forwprop.c: Likewise. * tree-vect-data-refs.c: Likewise. * tree-vect-generic.c: Likewise. * tree-vect-loop.c: Likewise. * tree-vect-slp.c: Likewise. * tree-vect-stmts.c: Likewise. * config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm_const): Delete. * config/aarch64/aarch64-simd.md (vec_perm_const<mode>): Delete. * config/aarch64/aarch64.c (aarch64_expand_vec_perm_const) (aarch64_vectorize_vec_perm_const_ok): Fuse into... (aarch64_vectorize_vec_perm_const): ...this new function. (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. * config/arm/arm-protos.h (arm_expand_vec_perm_const): Delete. * config/arm/vec-common.md (vec_perm_const<mode>): Delete. * config/arm/arm.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. (arm_expand_vec_perm_const, arm_vectorize_vec_perm_const_ok): Merge into... (arm_vectorize_vec_perm_const): ...this new function. Explicitly check for NEON modes. * config/i386/i386-protos.h (ix86_expand_vec_perm_const): Delete. * config/i386/sse.md (VEC_PERM_CONST, vec_perm_const<mode>): Delete. * config/i386/i386.c (ix86_expand_vec_perm_const_1): Update comment. (ix86_expand_vec_perm_const, ix86_vectorize_vec_perm_const_ok): Merge into... (ix86_vectorize_vec_perm_const): ...this new function. Incorporate the old VEC_PERM_CONST conditions. * config/ia64/ia64-protos.h (ia64_expand_vec_perm_const): Delete. * config/ia64/vect.md (vec_perm_const<mode>): Delete. * config/ia64/ia64.c (ia64_expand_vec_perm_const) (ia64_vectorize_vec_perm_const_ok): Merge into... (ia64_vectorize_vec_perm_const): ...this new function. * config/mips/loongson.md (vec_perm_const<mode>): Delete. * config/mips/mips-msa.md (vec_perm_const<mode>): Delete. * config/mips/mips-ps-3d.md (vec_perm_constv2sf): Delete. * config/mips/mips-protos.h (mips_expand_vec_perm_const): Delete. * config/mips/mips.c (mips_expand_vec_perm_const) (mips_vectorize_vec_perm_const_ok): Merge into... (mips_vectorize_vec_perm_const): ...this new function. * config/powerpcspe/altivec.md (vec_perm_constv16qi): Delete. * config/powerpcspe/paired.md (vec_perm_constv2sf): Delete. * config/powerpcspe/spe.md (vec_perm_constv2si): Delete. * config/powerpcspe/vsx.md (vec_perm_const<mode>): Delete. * config/powerpcspe/powerpcspe-protos.h (altivec_expand_vec_perm_const) (rs6000_expand_vec_perm_const): Delete. * config/powerpcspe/powerpcspe.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. (altivec_expand_vec_perm_const_le): Take each operand individually. Operate on constant selectors rather than rtxes. (altivec_expand_vec_perm_const): Likewise. Update call to altivec_expand_vec_perm_const_le. (rs6000_expand_vec_perm_const): Delete. (rs6000_vectorize_vec_perm_const_ok): Delete. (rs6000_vectorize_vec_perm_const): New function. (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of an element count and rtx array. (rs6000_expand_extract_even): Update call accordingly. (rs6000_expand_interleave): Likewise. * config/rs6000/altivec.md (vec_perm_constv16qi): Delete. * config/rs6000/paired.md (vec_perm_constv2sf): Delete. * config/rs6000/vsx.md (vec_perm_const<mode>): Delete. * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_const) (rs6000_expand_vec_perm_const): Delete. * config/rs6000/rs6000.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. (altivec_expand_vec_perm_const_le): Take each operand individually. Operate on constant selectors rather than rtxes. (altivec_expand_vec_perm_const): Likewise. Update call to altivec_expand_vec_perm_const_le. (rs6000_expand_vec_perm_const): Delete. (rs6000_vectorize_vec_perm_const_ok): Delete. (rs6000_vectorize_vec_perm_const): New function. Remove stray reference to the SPE evmerge intructions. (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of an element count and rtx array. (rs6000_expand_extract_even): Update call accordingly. (rs6000_expand_interleave): Likewise. * config/sparc/sparc.md (vec_perm_constv8qi): Delete in favor of... * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): ...this new function. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. From-SVN: r256093
2017-12-16Add VEC_SERIES_EXPR and associated optabRichard Sandiford1-0/+1
Similarly to the VEC_DUPLICATE_EXPR, this patch adds a tree code equivalent of the VEC_SERIES rtx code: VEC_SERIES_EXPR. 2017-12-16 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/generic.texi (VEC_SERIES_EXPR): Document. * doc/md.texi (vec_series@var{m}): Document. * tree.def (VEC_SERIES_EXPR): New tree code. * tree.h (build_vec_series): Declare. * tree.c (build_vec_series): New function. * cfgexpand.c (expand_debug_expr): Handle VEC_SERIES_EXPR. * tree-pretty-print.c (dump_generic_node): Likewise. * gimple-pretty-print.c (dump_binary_rhs): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * expr.c (expand_expr_real_2): Likewise. * optabs-tree.c (optab_for_tree_code): Likewise. * tree-cfg.c (verify_gimple_assign_binary): Likewise. * fold-const.c (const_binop): Fold VEC_SERIES_EXPRs of constants. * expmed.c (make_tree): Handle VEC_SERIES. * optabs.def (vec_series_optab): New optab. * optabs.h (expand_vec_series_expr): Declare. * optabs.c (expand_vec_series_expr): New function. * tree-vect-generic.c (expand_vector_operations_1): Check that the operands also have vector type. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255741
2017-12-16Add VEC_DUPLICATE_EXPR and associated optabRichard Sandiford1-0/+2
SVE needs a way of broadcasting a scalar to a variable-length vector. This patch adds VEC_DUPLICATE_EXPR for when CONSTRUCTOR would be used for fixed-length vectors; this is the tree equivalent of the existing rtl code VEC_DUPLICATE. The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR. 2017-12-16 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hawyard@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/generic.texi (VEC_DUPLICATE_EXPR): Document. (VEC_COND_EXPR): Add missing @tindex. * doc/md.texi (vec_duplicate@var{m}): Document. * tree.def (VEC_DUPLICATE_EXPR): New tree codes. * tree.c (build_vector_from_val): Add stubbed-out handling of variable-length vectors, using VEC_DUPLICATE_EXPR. (uniform_vector_p): Handle VEC_DUPLICATE_EXPR. * cfgexpand.c (expand_debug_expr): Likewise. * tree-cfg.c (verify_gimple_assign_unary): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-vect-generic.c (ssa_uniform_vector_p): Likewise. * fold-const.c (const_unop): Fold VEC_DUPLICATE_EXPRs of a constant. (test_vec_duplicate_folding): New function. (fold_const_c_tests): Call it. * optabs.def (vec_duplicate_optab): New optab. * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR. * optabs.h (expand_vector_broadcast): Declare. * optabs.c (expand_vector_broadcast): Make non-static. Try using vec_duplicate_optab. * expr.c (store_constructor): Try using vec_duplicate_optab for uniform vectors. (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255740
2017-08-08re PR middle-end/19706 (Recognize common Fortran usages of copysign.)Tamar Christina1-0/+1
2017-08-08 Tamar Christina <tamar.christina@arm.com> Andrew Pinski <pinskia@gmail.com> PR middle-end/19706 * internal-fn.def (XORSIGN): New. * optabs.def (xorsign_optab): New. * tree-ssa-math-opts.c (is_copysign_call_with_1): New. (convert_expand_mult_copysign): New. (pass_optimize_widening_mul::execute): Call convert_expand_mult_copysign. Co-Authored-By: Andrew Pinski <pinskia@gmail.com> From-SVN: r250956
2017-08-01re PR target/80846 (auto-vectorized AVX2 horizontal sum should narrow to ↵Jakub Jelinek1-2/+2
128b right away, to be more efficient for Ryzen and Intel) PR target/80846 * optabs.def (vec_extract_optab, vec_init_optab): Change from a direct optab to conversion optab. * optabs.c (expand_vector_broadcast): Use convert_optab_handler with GET_MODE_INNER as last argument instead of optab_handler. * expmed.c (extract_bit_field_1): Likewise. Use vector from vector extraction if possible and optab is available. * expr.c (store_constructor): Use convert_optab_handler instead of optab_handler. Use vector initialization from smaller vectors if possible and optab is available. * tree-vect-stmts.c (vectorizable_load): Likewise. * doc/md.texi (vec_extract, vec_init): Document that the optabs now have two modes. * config/i386/i386.c (ix86_expand_vector_init): Handle expansion of vec_init from half-sized vectors with the same element mode. * config/i386/sse.md (ssehalfvecmode): Add V4TI case. (ssehalfvecmodelower, ssescalarmodelower): New mode attributes. (reduc_plus_scal_v8df, reduc_plus_scal_v4df, reduc_plus_scal_v2df, reduc_plus_scal_v16sf, reduc_plus_scal_v8sf, reduc_plus_scal_v4sf, reduc_<code>_scal_<mode>, reduc_umin_scal_v8hi): Add element mode after mode in gen_vec_extract* calls. (vec_extract<mode>): Renamed to ... (vec_extract<mode><ssescalarmodelower>): ... this. (vec_extract<mode><ssehalfvecmodelower>): New expander. (rotl<mode>3, rotr<mode>3, <shift_insn><mode>3, ashrv2di3): Add element mode after mode in gen_vec_init* calls. (VEC_INIT_HALF_MODE): New mode iterator. (vec_init<mode>): Renamed to ... (vec_init<mode><ssescalarmodelower>): ... this. (vec_init<mode><ssehalfvecmodelower>): New expander. * config/i386/mmx.md (vec_extractv2sf): Renamed to ... (vec_extractv2sfsf): ... this. (vec_initv2sf): Renamed to ... (vec_initv2sfsf): ... this. (vec_extractv2si): Renamed to ... (vec_extractv2sisi): ... this. (vec_initv2si): Renamed to ... (vec_initv2sisi): ... this. (vec_extractv4hi): Renamed to ... (vec_extractv4hihi): ... this. (vec_initv4hi): Renamed to ... (vec_initv4hihi): ... this. (vec_extractv8qi): Renamed to ... (vec_extractv8qiqi): ... this. (vec_initv8qi): Renamed to ... (vec_initv8qiqi): ... this. * config/rs6000/vector.md (VEC_base_l): New mode attribute. (vec_init<mode>): Renamed to ... (vec_init<mode><VEC_base_l>): ... this. (vec_extract<mode>): Renamed to ... (vec_extract<mode><VEC_base_l>): ... this. * config/rs6000/paired.md (vec_initv2sf): Renamed to ... (vec_initv2sfsf): ... this. * config/rs6000/altivec.md (splitter, altivec_copysign_v4sf3, vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi, vec_unpacku_lo_v8hi, mulv16qi3, altivec_vreve<mode>2): Add element mode after mode in gen_vec_init* calls. * config/aarch64/aarch64-simd.md (vec_init<mode>): Renamed to ... (vec_init<mode><Vel>): ... this. (vec_extract<mode>): Renamed to ... (vec_extract<mode><Vel>): ... this. * config/aarch64/iterators.md (Vel): New mode attribute. * config/s390/s390.c (s390_expand_vec_strlen, s390_expand_vec_movstr): Add element mode after mode in gen_vec_extract* calls. * config/s390/vector.md (non_vec_l): New mode attribute. (vec_extract<mode>): Renamed to ... (vec_extract<mode><non_vec_l>): ... this. (vec_init<mode>): Renamed to ... (vec_init<mode><non_vec_l>): ... this. * config/s390/s390-builtins.def (s390_vlgvb, s390_vlgvh, s390_vlgvf, s390_vlgvf_flt, s390_vlgvg, s390_vlgvg_dbl): Add element mode after vec_extract mode. * config/arm/iterators.md (V_elem_l): New mode attribute. * config/arm/neon.md (vec_extract<mode>): Renamed to ... (vec_extract<mode><V_elem_l>): ... this. (vec_extractv2di): Renamed to ... (vec_extractv2didi): ... this. (vec_init<mode>): Renamed to ... (vec_init<mode><V_elem_l>): ... this. (reduc_plus_scal_<mode>, reduc_plus_scal_v2di, reduc_smin_scal_<mode>, reduc_smax_scal_<mode>, reduc_umin_scal_<mode>, reduc_umax_scal_<mode>, neon_vget_lane<mode>, neon_vget_laneu<mode>): Add element mode after gen_vec_extract* calls. * config/mips/mips-msa.md (vec_init<mode>): Renamed to ... (vec_init<mode><unitmode>): ... this. (vec_extract<mode>): Renamed to ... (vec_extract<mode><unitmode>): ... this. * config/mips/loongson.md (vec_init<mode>): Renamed to ... (vec_init<mode><unitmode>): ... this. * config/mips/mips-ps-3d.md (vec_initv2sf): Renamed to ... (vec_initv2sfsf): ... this. (vec_extractv2sf): Renamed to ... (vec_extractv2sfsf): ... this. (reduc_plus_scal_v2sf, reduc_smin_scal_v2sf, reduc_smax_scal_v2sf): Add element mode after gen_vec_extract* calls. * config/mips/mips.md (unitmode): New mode iterator. * config/spu/spu.c (spu_expand_prologue, spu_allocate_stack, spu_builtin_extract): Add element mode after gen_vec_extract* calls. * config/spu/spu.md (inner_l): New mode attribute. (vec_init<mode>): Renamed to ... (vec_init<mode><inner_l>): ... this. (vec_extract<mode>): Renamed to ... (vec_extract<mode><inner_l>): ... this. * config/sparc/sparc.md (veltmode): New mode iterator. (vec_init<VMALL:mode>): Renamed to ... (vec_init<VMALL:mode><VMALL:veltmode>): ... this. * config/ia64/vect.md (vec_initv2si): Renamed to ... (vec_initv2sisi): ... this. (vec_initv2sf): Renamed to ... (vec_initv2sfsf): ... this. (vec_extractv2sf): Renamed to ... (vec_extractv2sfsf): ... this. * config/powerpcspe/vector.md (VEC_base_l): New mode attribute. (vec_init<mode>): Renamed to ... (vec_init<mode><VEC_base_l>): ... this. (vec_extract<mode>): Renamed to ... (vec_extract<mode><VEC_base_l>): ... this. * config/powerpcspe/paired.md (vec_initv2sf): Renamed to ... (vec_initv2sfsf): ... this. * config/powerpcspe/altivec.md (splitter, altivec_copysign_v4sf3, vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi, vec_unpacku_lo_v8hi, mulv16qi3): Add element mode after mode in gen_vec_init* calls. From-SVN: r250759
2017-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r243994
2016-10-25re PR target/78102 (GCC refuses to generate PCMPEQQ instruction for SSE4.1)Jakub Jelinek1-0/+2
PR target/78102 * optabs.def (vcondeq_optab, vec_cmpeq_optab): New optabs. * optabs.c (expand_vec_cond_expr): For comparison codes EQ_EXPR and NE_EXPR, attempt vcondeq_optab as fallback. (expand_vec_cmp_expr): For comparison codes EQ_EXPR and NE_EXPR, attempt vec_cmpeq_optab as fallback. * optabs-tree.h (expand_vec_cmp_expr_p, expand_vec_cond_expr_p): Add enum tree_code argument. * optabs-query.h (get_vec_cmp_eq_icode, get_vcond_eq_icode): New inline functions. * optabs-tree.c (expand_vec_cmp_expr_p): Add CODE argument. For CODE EQ_EXPR or NE_EXPR, attempt to use vec_cmpeq_optab as fallback. (expand_vec_cond_expr_p): Add CODE argument. For CODE EQ_EXPR or NE_EXPR, attempt to use vcondeq_optab as fallback. * tree-vect-generic.c (expand_vector_comparison, expand_vector_divmod, expand_vector_condition): Adjust expand_vec_cmp_expr_p and expand_vec_cond_expr_p callers. * tree-vect-stmts.c (vectorizable_condition, vectorizable_comparison): Likewise. * tree-vect-patterns.c (vect_recog_mixed_size_cond_pattern, check_bool_pattern, search_type_for_mask_1): Likewise. * expr.c (do_store_flag): Likewise. * doc/md.texi (@code{vec_cmpeq@var{m}@var{n}}, @code{vcondeq@var{m}@var{n}}): Document. * config/i386/sse.md (vec_cmpeqv2div2di, vcondeq<VI8F_128:mode>v2di): New expanders. testsuite/ * gcc.target/i386/pr78102.c: New test. From-SVN: r241525
2016-10-14optabs.def: Remove optab function gen_int_libfunc for sdivmod_optab and ↵Prathamesh Kulkarni1-2/+2
udivmod_optab. 2016-10-14 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> * optabs.def: Remove optab function gen_int_libfunc for sdivmod_optab and udivmod_optab. From-SVN: r241147
2016-05-03re PR target/49244 (__sync or __atomic builtins will not emit 'lock ↵Jakub Jelinek1-0/+3
bts/btr/btc') PR target/49244 * tree-ssa-ccp.c: Include stor-layout.h and optabs-query.h. (optimize_atomic_bit_test_and): New function. (pass_fold_builtins::execute): Use it. * optabs.def (atomic_bit_test_and_set_optab, atomic_bit_test_and_complement_optab, atomic_bit_test_and_reset_optab): New optabs. * internal-fn.def (ATOMIC_BIT_TEST_AND_SET, ATOMIC_BIT_TEST_AND_COMPLEMENT, ATOMIC_BIT_TEST_AND_RESET): New ifns. * builtins.h (expand_ifn_atomic_bit_test_and): New prototype. * builtins.c (expand_ifn_atomic_bit_test_and): New function. * internal-fn.c (expand_ATOMIC_BIT_TEST_AND_SET, expand_ATOMIC_BIT_TEST_AND_COMPLEMENT, expand_ATOMIC_BIT_TEST_AND_RESET): New functions. * doc/md.texi (atomic_bit_test_and_set@var{mode}, atomic_bit_test_and_complement@var{mode}, atomic_bit_test_and_reset@var{mode}): Document. * config/i386/sync.md (atomic_bit_test_and_set<mode>, atomic_bit_test_and_complement<mode>, atomic_bit_test_and_reset<mode>): New expanders. (atomic_bit_test_and_set<mode>_1, atomic_bit_test_and_complement<mode>_1, atomic_bit_test_and_reset<mode>_1): New insns. * gcc.target/i386/pr49244-1.c: New test. * gcc.target/i386/pr49244-2.c: New test. From-SVN: r235813
2016-01-14Tidy: remove reduc_xxx_optab migration code.Alan Lawrence1-7/+0
* doc/md.texi (reduc_smin_@var{m}, reduc_smax_@var{m}, reduc_umin_@var{m}, reduc_umax_@var{m}, reduc_splus_@var{m}, reduc_uplus_@var{m}): Remove. * expr.c (expand_expr_real_2): Remove expansion path for reduc_[us](min|max|plus) optabs. * optabs-tree.c (scalar_reduc_to_vector): Remove. * optabs-tree.h (scalar_reduc_to_vector): Remove. * optabs.def (reduc_smax_optab, reduc_smin_optab, reduc_splus_optab, reduc_umax_optab, reduc_umin_optab, reduc_uplus_optab): Remove. * tree-vect-loop.c (vectorizable_reduction): Remove test for reduc_[us](min|max|plus) optabs. From-SVN: r232373
2016-01-04Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r232055
2015-12-03Add an rsqrt_optab and IFN_RSQRT internal functionRichard Sandiford1-0/+1
All current uses of builtin_reciprocal convert 1.0/sqrt into rsqrt. This patch adds an rsqrt optab and associated internal function for that instead. We can then pick up the vector forms of rsqrt automatically, fixing an AArch64 regression from my internal_fn patches. With that change, builtin_reciprocal only needs to handle target-specific built-in functions. I've restricted the hook to those since, if we need a reciprocal of another standard function later, I think there should be a strong preference for adding a new optab and internal function for it, rather than hiding the code in a backend. Three targets implement builtin_reciprocal: aarch64, i386 and rs6000. i386 and rs6000 already used the obvious rsqrt<mode>2 pattern names for the instructions, so they pick up the new code automatically. aarch64 needs a slight rename. mn10300 is unusual in that its native operation is rsqrt, and sqrt is approximated as 1.0/rsqrt. The port also uses rsqrt<mode>2 for the rsqrt pattern, so after the patch we now pick it up as a native operation. Two other ports define rsqrt patterns: sh and v850. AFAICT these patterns aren't currently used, but I think the patch does what the authors of the patterns would have expected. There's obviously some risk of fallout though. Tested on x86_64-linux-gnu, aarch64-linux-gnu, arm-linux-gnueabihf (as a target without the hooks) and powerpc64-linux-gnu. gcc/ * internal-fn.def (RSQRT): New function. * optabs.def (rsqrt_optab): New optab. * doc/md.texi (rsqrtM2): Document. * target.def (builtin_reciprocal): Replace gcall argument with a function decl. Restrict hook to machine functions. * doc/tm.texi: Regenerate. * targhooks.h (default_builtin_reciprocal): Update prototype. * targhooks.c (default_builtin_reciprocal): Likewise. * tree-ssa-math-opts.c: Include internal-fn.h. (internal_fn_reciprocal): New function. (pass_cse_reciprocals::execute): Call it, and build a call to an internal function on success. Only call targetm.builtin_reciprocal for machine functions. * config/aarch64/aarch64-protos.h (aarch64_builtin_rsqrt): Remove second argument. * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin_rsqrt): Rename aarch64_rsqrt_<mode>2 to rsqrt<mode>2. (aarch64_builtin_rsqrt): Remove md_fn argument and only handle machine functions. * config/aarch64/aarch64.c (use_rsqrt_p): New function. (aarch64_builtin_reciprocal): Replace gcall argument with a function decl. Use use_rsqrt_p. Remove optimize_size check. Only handle machine functions. Update call to aarch64_builtin_rsqrt. (aarch64_optab_supported_p): New function. (TARGET_OPTAB_SUPPORTED_P): Define. * config/aarch64/aarch64-simd.md (aarch64_rsqrt_<mode>2): Rename to... (rsqrt<mode>2): ...this. * config/i386/i386.c (use_rsqrt_p): New function. (ix86_builtin_reciprocal): Replace gcall argument with a function decl. Use use_rsqrt_p. Remove optimize_insn_for_size_p check. Only handle machine functions. (ix86_optab_supported_p): Handle rsqrt_optab. * config/rs6000/rs6000.c (TARGET_OPTAB_SUPPORTED_P): Define. (rs6000_builtin_reciprocal): Replace gcall argument with a function decl. Remove optimize_insn_for_size_p check. Only handle machine functions. (rs6000_optab_supported_p): New function. From-SVN: r231229
2015-12-02Check for invalid FAILsRichard Sandiford1-2/+23
This patch makes it a compile-time error for an internal-fn optab to FAIL. There are certainly other optabs and patterns besides these that aren't allowed to fail, but this at least deals with the immediate point of controversy. Tested normally on x86_64-linux-gnu. Also tested by building one configuration per cpu directory. arc-elf and pdp11 didn't build for unrelated reasons, but I checked that insn-emit.o built for both without error. gcc/ * Makefile.in (GENSUPPORT_H): New macro. (build/gensupport.o, build/read-rtl.o, build/genattr.o) (build/genattr-common.o, build/genattrtab.o, build/genautomata.o) (build/gencodes.o, build/genconditions.o, build/genconfig.o) (build/genconstants.o, build/genextract.o, build/genflags.o) (build/gentarget-def.o): Use it. (build/genemit.o): Likewise. Depend on internal-fn.def. * genopinit.c: Move block comment to optabs.def. (optab_tag, optab_def): Move to gensupport.h (pattern): Likewise, renaming to optab_pattern. (match_pattern): Move to gensupport.c (gen_insn): Use find_optab. (patterns, pattern_cmp): Replace pattern with optab_pattern. (main): Likewise. Use num_optabs. * optabs.def: Add comment that was previously in genopinit.c. * gensupport.h (optab_tag): Moved from genopinit.c (optab_def): Likewise, expanding commentary. (optab_pattern): Likewise, after renaming from pattern. (optabs, num_optabs, find_optab): Declare. * gensupport.c (optabs): Moved from genopinit.c. (num_optabs): New variable. (match_pattern): Moved from genopinit.c. (find_optab): New function, extracted from genopinit.c:gen_insn. * genemit.c (nofail_optabs): New variable. (emit_c_code): New function. (gen_expand): Check whether the instruction is an optab that isn't allowed to fail. Call emit_c_code. (gen_split): Call emit_c_code here too. (main): Initialize nofail_optabs. Don't emit FAIL and DONE here. From-SVN: r231160
2015-11-25optabs.def: Add new optabs fmax_optab/fmin_optab.David Sherwood1-0/+4
2015-11-25 David Sherwood <david.sherwood@arm.com> * optabs.def: Add new optabs fmax_optab/fmin_optab. * internal-fn.def: Add new fmax/fmin internal functions. * doc/md.texi: Add fmin and fmax patterns. From-SVN: r230888
2015-11-23Add uaddv4_optab and usubv4_optabRichard Henderson1-0/+2
PR target/67089 * optabs.def (uaddv4_optab, usubv4_optab): New. * internal-fn.c (expand_addsub_overflow): Use uaddv4_optab and usubv4_optab in the u +- u -> u case. * doc/md.texi (Standard Names): Document addv{m}4, subv{m}4, mulv{m}4, uaddv{m}4, usubv{m}4, umulv{m}4. * config/i386/i386.md (uaddv<SWI>4, usubv<SWI>4): New. From-SVN: r230767
2015-11-10optabs-query.h (get_vcond_mask_icode): New.Ilya Enkovich1-0/+1
gcc/ 2015-11-10 Ilya Enkovich <enkovich.gnu@gmail.com> * optabs-query.h (get_vcond_mask_icode): New. * optabs-tree.c (expand_vec_cond_expr_p): Use get_vcond_mask_icode for VEC_COND_EXPR with mask. * optabs.c (expand_vec_cond_mask_expr): New. (expand_vec_cond_expr): Use get_vcond_mask_icode when possible. * optabs.def (vcond_mask_optab): New. * tree-vect-patterns.c (vect_recog_bool_pattern): Don't generate redundant comparison for COND_EXPR. * tree-vect-stmts.c (vect_is_simple_cond): Allow SSA_NAME as a condition. (vectorizable_condition): Likewise. * tree-vect-slp.c (vect_get_and_check_slp_defs): Allow cond_exp with no embedded comparison. (vect_build_slp_tree_1): Likewise. From-SVN: r230101
2015-11-10internal-fn.c (expand_MASK_LOAD): Adjust to maskload optab changes.Ilya Enkovich1-2/+2
gcc/ * internal-fn.c (expand_MASK_LOAD): Adjust to maskload optab changes. (expand_MASK_STORE): Adjust to maskstore optab changes. * optabs-query.c (can_vec_mask_load_store_p): Add MASK_MODE arg. Adjust to maskload, maskstore optab changes. * optabs-query.h (can_vec_mask_load_store_p): Add MASK_MODE arg. * optabs.def (maskload_optab): Transform into convert optab. (maskstore_optab): Likewise. * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to can_vec_mask_load_store_p signature change. (predicate_mem_writes): Use boolean mask. * tree-vect-stmts.c (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p signature change. Allow invariant masks. (vectorizable_operation): Ignore type precision for boolean vectors. gcc/testsuite/ * gcc.target/i386/avx2-vec-mask-bit-not.c: New test. From-SVN: r230099