aboutsummaryrefslogtreecommitdiff
path: root/gcc
diff options
context:
space:
mode:
authorRichard Sandiford <richard.sandiford@linaro.org>2018-01-13 17:50:35 +0000
committerRichard Sandiford <rsandifo@gcc.gnu.org>2018-01-13 17:50:35 +0000
commit43cacb12fc859b671464b63668794158974b2a34 (patch)
treedfb16013b4bceb9d0886750b889e31f9f7d916e3 /gcc
parent11e0322aead708df5f572f5d3c50d27103f8c9a8 (diff)
downloadgcc-43cacb12fc859b671464b63668794158974b2a34.zip
gcc-43cacb12fc859b671464b63668794158974b2a34.tar.gz
gcc-43cacb12fc859b671464b63668794158974b2a34.tar.bz2
[AArch64] Add SVE support
This patch adds support for ARM's Scalable Vector Extension. The patch just contains the core features that work with the current vectoriser framework; later patches will add extra capabilities to both the target-independent code and AArch64 code. The patch doesn't include: - support for unwinding frames whose size depends on the vector length - modelling the effect of __tls_get_addr on the SVE registers These are handled by later patches instead. Some notes: - The copyright years for aarch64-sve.md start at 2009 because some of the code is based on aarch64.md, which also starts from then. - The patch inserts spaces between items in the AArch64 section of sourcebuild.texi. This matches at least the surrounding architectures and looks a little nicer in the info output. - aarch64-sve.md includes a pattern: while_ult<GPI:mode><PRED_ALL:mode> A later patch adds a matching "while_ult" optab, but the pattern is also needed by the predicate vec_duplicate expander. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/invoke.texi (-msve-vector-bits=): Document new option. (sve): Document new AArch64 extension. * doc/md.texi (w): Extend the description of the AArch64 constraint to include SVE vectors. (Upl, Upa): Document new AArch64 predicate constraints. * config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New enum. * config/aarch64/aarch64.opt (sve_vector_bits): New enum. (msve-vector-bits=): New option. * config/aarch64/aarch64-option-extensions.def (fp, simd): Disable SVE when these are disabled. (sve): New extension. * config/aarch64/aarch64-modes.def: Define SVE vector and predicate modes. Adjust their number of units based on aarch64_sve_vg. (MAX_BITSIZE_MODE_ANY_MODE): Define. * config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New aarch64_addr_query_type. (aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode) (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p) (aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries) (aarch64_split_add_offset, aarch64_output_sve_cnt_immediate) (aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate) (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare. (aarch64_simd_imm_zero_p): Delete. (aarch64_check_zero_based_sve_index_immediate): Declare. (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p) (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p) (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p) (aarch64_sve_float_mul_immediate_p): Likewise. (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT rather than an rtx. (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare. (aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback. (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare. (aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float) (aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare. (aarch64_regmode_natural_size): Likewise. * config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro. (AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift left one place. (AARCH64_ISA_SVE, TARGET_SVE): New macros. (FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries for VG and the SVE predicate registers. (V_ALIASES): Add a "z"-prefixed alias. (FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1. (AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros. (PR_REGNUM_P, PR_LO_REGNUM_P): Likewise. (PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes. (REG_CLASS_NAMES): Add entries for them. (REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG and the predicate registers. (aarch64_sve_vg): Declare. (BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED) (SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros. (REGMODE_NATURAL_SIZE): Define. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle SVE macros. * config/aarch64/aarch64.c: Include cfgrtl.h. (simd_immediate_info): Add a constructor for series vectors, and an associated step field. (aarch64_sve_vg): New variable. (aarch64_dbx_register_number): Handle VG and the predicate registers. (aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete. (VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE) (VEC_ANY_DATA, VEC_STRUCT): New constants. (aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p) (aarch64_classify_vector_mode, aarch64_vector_data_mode_p) (aarch64_sve_data_mode_p, aarch64_sve_pred_mode) (aarch64_get_mask_mode): New functions. (aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS. (aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE predicate modes and predicate registers. Explicitly restrict GPRs to modes of 16 bytes or smaller. Only allow FP registers to store a vector mode if it is recognized by aarch64_classify_vector_mode. (aarch64_regmode_natural_size): New function. (aarch64_hard_regno_caller_save_mode): Return the original mode for predicates. (aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate) (aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl) (aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate) (aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New functions. (aarch64_add_offset): Add a temp2 parameter. Assert that temp1 does not overlap dest if the function is frame-related. Handle SVE constants. (aarch64_split_add_offset): New function. (aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass them aarch64_add_offset. (aarch64_allocate_and_probe_stack_space): Add a temp2 parameter and update call to aarch64_sub_sp. (aarch64_add_cfa_expression): New function. (aarch64_expand_prologue): Pass extra temporary registers to the functions above. Handle the case in which we need to emit new DW_CFA_expressions for registers that were originally saved relative to the stack pointer, but now have to be expressed relative to the frame pointer. (aarch64_output_mi_thunk): Pass extra temporary registers to the functions above. (aarch64_expand_epilogue): Likewise. Prevent inheritance of IP0 and IP1 values for SVE frames. (aarch64_expand_vec_series): New function. (aarch64_expand_sve_widened_duplicate): Likewise. (aarch64_expand_sve_const_vector): Likewise. (aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter. Handle SVE constants. Use emit_move_insn to move a force_const_mem into the register, rather than emitting a SET directly. (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move) (aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p) (offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p) (offset_9bit_signed_scaled_p): New functions. (aarch64_replicate_bitmask_imm): New function. (aarch64_bitmask_imm): Use it. (aarch64_cannot_force_const_mem): Reject expressions involving a CONST_POLY_INT. Update call to aarch64_classify_symbol. (aarch64_classify_index): Handle SVE indices, by requiring a plain register index with a scale that matches the element size. (aarch64_classify_address): Handle SVE addresses. Assert that the mode of the address is VOIDmode or an integer mode. Update call to aarch64_classify_symbol. (aarch64_classify_symbolic_expression): Update call to aarch64_classify_symbol. (aarch64_const_vec_all_in_range_p): New function. (aarch64_print_vector_float_operand): Likewise. (aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than "vN" for FP registers with SVE modes. Handle (const ...) vectors and the FP immediates 1.0 and 0.5. (aarch64_print_address_internal): Handle SVE addresses. (aarch64_print_operand_address): Use ADDR_QUERY_ANY. (aarch64_regno_regclass): Handle predicate registers. (aarch64_secondary_reload): Handle big-endian reloads of SVE data modes. (aarch64_class_max_nregs): Handle SVE modes and predicate registers. (aarch64_rtx_costs): Check for ADDVL and ADDPL instructions. (aarch64_convert_sve_vector_bits): New function. (aarch64_override_options): Use it to handle -msve-vector-bits=. (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT rather than an rtx. (aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode. Handle SVE vector and predicate modes. Accept VL-based constants that need only one temporary register, and VL offsets that require no temporary registers. (aarch64_conditional_register_usage): Mark the predicate registers as fixed if SVE isn't available. (aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode. Return true for SVE vector and predicate modes. (aarch64_simd_container_mode): Take the number of bits as a poly_int64 rather than an unsigned int. Handle SVE modes. (aarch64_preferred_simd_mode): Update call accordingly. Handle SVE modes. (aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR if SVE is enabled. (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p) (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p) (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p) (aarch64_sve_float_mul_immediate_p): New functions. (aarch64_sve_valid_immediate): New function. (aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors. Explicitly reject structure modes. Check for INDEX constants. Handle PTRUE and PFALSE constants. (aarch64_check_zero_based_sve_index_immediate): New function. (aarch64_simd_imm_zero_p): Delete. (aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for vector modes. Accept constants in the range of CNT[BHWD]. (aarch64_simd_scalar_immediate_valid_for_move): Explicitly ask for an Advanced SIMD mode. (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions. (aarch64_simd_vector_alignment): Handle SVE predicates. (aarch64_vectorize_preferred_vector_alignment): New function. (aarch64_simd_vector_alignment_reachable): Use it instead of the vector size. (aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p. (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New functions. (MAX_VECT_LEN): Delete. (expand_vec_perm_d): Add a vec_flags field. (emit_unspec2, aarch64_expand_sve_vec_perm): New functions. (aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip) (aarch64_evpc_ext): Don't apply a big-endian lane correction for SVE modes. (aarch64_evpc_rev): Rename to... (aarch64_evpc_rev_local): ...this. Use a predicated operation for SVE. (aarch64_evpc_rev_global): New function. (aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP. (aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of MAX_VECT_LEN. (aarch64_evpc_sve_tbl): New function. (aarch64_expand_vec_perm_const_1): Update after rename of aarch64_evpc_rev. Handle SVE permutes too, trying aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather than aarch64_evpc_tbl. (aarch64_vectorize_vec_perm_const): Initialize vec_flags. (aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code) (aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int) (aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or) (aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float) (aarch64_expand_sve_vcond): New functions. (aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead of aarch64_vector_mode_p. (aarch64_dwarf_poly_indeterminate_value): New function. (aarch64_compute_pressure_classes): Likewise. (aarch64_can_change_mode_class): Likewise. (TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine. (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise. (TARGET_VECTORIZE_GET_MASK_MODE): Likewise. (TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise. (TARGET_COMPUTE_PRESSURE_CLASSES): Likewise. (TARGET_CAN_CHANGE_MODE_CLASS): Likewise. * config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr) (Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New constraints. (Dn, Dl, Dr): Accept const as well as const_vector. (Dz): Likewise. Compare against CONST0_RTX. * config/aarch64/iterators.md: Refer to "Advanced SIMD" instead of "vector" where appropriate. (SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD) (SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators. (UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT) (UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE) (UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS) (UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs. (Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV) (v_int_equiv): Extend to SVE modes. (Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New mode attributes. (LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators. (optab): Handle popcount, smin, smax, umin, umax, abs and sqrt. (logical_nn, lr, sve_int_op, sve_fp_op): New code attributs. (LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP) (SVE_COND_FP_CMP): New int iterators. (perm_hilo): Handle the new unpack unspecs. (optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int attributes. * config/aarch64/predicates.md (aarch64_sve_cnt_immediate) (aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate) (aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand) (aarch64_equality_operator, aarch64_constant_vector_operand) (aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates. (aarch64_sve_nonimmediate_operand): Likewise. (aarch64_sve_general_operand): Likewise. (aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise. (aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate) (aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise. (aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise. (aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise. (aarch64_sve_float_arith_immediate): Likewise. (aarch64_sve_float_arith_with_sub_immediate): Likewise. (aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise. (aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise. (aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise. (aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise. (aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise. (aarch64_sve_float_arith_operand): Likewise. (aarch64_sve_float_arith_with_sub_operand): Likewise. (aarch64_sve_float_mul_operand): Likewise. (aarch64_sve_vec_perm_operand): Likewise. (aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate. (aarch64_mov_operand): Accept const_poly_int and const_vector. (aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const as well as const_vector. (aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier in file. Use CONST0_RTX and CONSTM1_RTX. (aarch64_simd_or_scalar_imm_zero): Likewise. Add match_codes. (aarch64_simd_reg_or_zero): Accept const as well as const_vector. Use aarch64_simd_imm_zero. * config/aarch64/aarch64-sve.md: New file. * config/aarch64/aarch64.md: Include it. (VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers. (UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE) (UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI) (UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK) (UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants. (sve): New attribute. (enabled): Disable instructions with the sve attribute unless TARGET_SVE. (movqi, movhi): Pass CONST_POLY_INT operaneds through aarch64_expand_mov_immediate. (*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle CNT[BHSD] immediates. (movti): Split CONST_POLY_INT moves into two halves. (add<mode>3): Accept aarch64_pluslong_or_poly_operand. Split additions that need a temporary here if the destination is the stack pointer. (*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates. (*add<mode>3_poly_1): New instruction. (set_clobber_cc): New expander. Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256612
Diffstat (limited to 'gcc')
-rw-r--r--gcc/ChangeLog292
-rw-r--r--gcc/config/aarch64/aarch64-c.c9
-rw-r--r--gcc/config/aarch64/aarch64-modes.def50
-rw-r--r--gcc/config/aarch64/aarch64-option-extensions.def20
-rw-r--r--gcc/config/aarch64/aarch64-opts.h10
-rw-r--r--gcc/config/aarch64/aarch64-protos.h48
-rw-r--r--gcc/config/aarch64/aarch64-sve.md1922
-rw-r--r--gcc/config/aarch64/aarch64.c2310
-rw-r--r--gcc/config/aarch64/aarch64.h96
-rw-r--r--gcc/config/aarch64/aarch64.md183
-rw-r--r--gcc/config/aarch64/aarch64.opt26
-rw-r--r--gcc/config/aarch64/constraints.md120
-rw-r--r--gcc/config/aarch64/iterators.md400
-rw-r--r--gcc/config/aarch64/predicates.md198
-rw-r--r--gcc/doc/invoke.texi20
-rw-r--r--gcc/doc/md.texi8
16 files changed, 5363 insertions, 349 deletions
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3f1919e..40da1eb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,4 +1,296 @@
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
+ Alan Hayward <alan.hayward@arm.com>
+ David Sherwood <david.sherwood@arm.com>
+
+ * doc/invoke.texi (-msve-vector-bits=): Document new option.
+ (sve): Document new AArch64 extension.
+ * doc/md.texi (w): Extend the description of the AArch64
+ constraint to include SVE vectors.
+ (Upl, Upa): Document new AArch64 predicate constraints.
+ * config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
+ enum.
+ * config/aarch64/aarch64.opt (sve_vector_bits): New enum.
+ (msve-vector-bits=): New option.
+ * config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
+ SVE when these are disabled.
+ (sve): New extension.
+ * config/aarch64/aarch64-modes.def: Define SVE vector and predicate
+ modes. Adjust their number of units based on aarch64_sve_vg.
+ (MAX_BITSIZE_MODE_ANY_MODE): Define.
+ * config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
+ aarch64_addr_query_type.
+ (aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
+ (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
+ (aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
+ (aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
+ (aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
+ (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
+ (aarch64_simd_imm_zero_p): Delete.
+ (aarch64_check_zero_based_sve_index_immediate): Declare.
+ (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
+ (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
+ (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
+ (aarch64_sve_float_mul_immediate_p): Likewise.
+ (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
+ rather than an rtx.
+ (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
+ (aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
+ (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
+ (aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
+ (aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
+ (aarch64_regmode_natural_size): Likewise.
+ * config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
+ (AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
+ left one place.
+ (AARCH64_ISA_SVE, TARGET_SVE): New macros.
+ (FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
+ for VG and the SVE predicate registers.
+ (V_ALIASES): Add a "z"-prefixed alias.
+ (FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
+ (AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
+ (PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
+ (PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
+ (REG_CLASS_NAMES): Add entries for them.
+ (REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG
+ and the predicate registers.
+ (aarch64_sve_vg): Declare.
+ (BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
+ (SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
+ (REGMODE_NATURAL_SIZE): Define.
+ * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
+ SVE macros.
+ * config/aarch64/aarch64.c: Include cfgrtl.h.
+ (simd_immediate_info): Add a constructor for series vectors,
+ and an associated step field.
+ (aarch64_sve_vg): New variable.
+ (aarch64_dbx_register_number): Handle VG and the predicate registers.
+ (aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete.
+ (VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE)
+ (VEC_ANY_DATA, VEC_STRUCT): New constants.
+ (aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p)
+ (aarch64_classify_vector_mode, aarch64_vector_data_mode_p)
+ (aarch64_sve_data_mode_p, aarch64_sve_pred_mode)
+ (aarch64_get_mask_mode): New functions.
+ (aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS
+ and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS.
+ (aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE
+ predicate modes and predicate registers. Explicitly restrict
+ GPRs to modes of 16 bytes or smaller. Only allow FP registers
+ to store a vector mode if it is recognized by
+ aarch64_classify_vector_mode.
+ (aarch64_regmode_natural_size): New function.
+ (aarch64_hard_regno_caller_save_mode): Return the original mode
+ for predicates.
+ (aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate)
+ (aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl)
+ (aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate)
+ (aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New
+ functions.
+ (aarch64_add_offset): Add a temp2 parameter. Assert that temp1
+ does not overlap dest if the function is frame-related. Handle
+ SVE constants.
+ (aarch64_split_add_offset): New function.
+ (aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass
+ them aarch64_add_offset.
+ (aarch64_allocate_and_probe_stack_space): Add a temp2 parameter
+ and update call to aarch64_sub_sp.
+ (aarch64_add_cfa_expression): New function.
+ (aarch64_expand_prologue): Pass extra temporary registers to the
+ functions above. Handle the case in which we need to emit new
+ DW_CFA_expressions for registers that were originally saved
+ relative to the stack pointer, but now have to be expressed
+ relative to the frame pointer.
+ (aarch64_output_mi_thunk): Pass extra temporary registers to the
+ functions above.
+ (aarch64_expand_epilogue): Likewise. Prevent inheritance of
+ IP0 and IP1 values for SVE frames.
+ (aarch64_expand_vec_series): New function.
+ (aarch64_expand_sve_widened_duplicate): Likewise.
+ (aarch64_expand_sve_const_vector): Likewise.
+ (aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter.
+ Handle SVE constants. Use emit_move_insn to move a force_const_mem
+ into the register, rather than emitting a SET directly.
+ (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move)
+ (aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p)
+ (offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p)
+ (offset_9bit_signed_scaled_p): New functions.
+ (aarch64_replicate_bitmask_imm): New function.
+ (aarch64_bitmask_imm): Use it.
+ (aarch64_cannot_force_const_mem): Reject expressions involving
+ a CONST_POLY_INT. Update call to aarch64_classify_symbol.
+ (aarch64_classify_index): Handle SVE indices, by requiring
+ a plain register index with a scale that matches the element size.
+ (aarch64_classify_address): Handle SVE addresses. Assert that
+ the mode of the address is VOIDmode or an integer mode.
+ Update call to aarch64_classify_symbol.
+ (aarch64_classify_symbolic_expression): Update call to
+ aarch64_classify_symbol.
+ (aarch64_const_vec_all_in_range_p): New function.
+ (aarch64_print_vector_float_operand): Likewise.
+ (aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than
+ "vN" for FP registers with SVE modes. Handle (const ...) vectors
+ and the FP immediates 1.0 and 0.5.
+ (aarch64_print_address_internal): Handle SVE addresses.
+ (aarch64_print_operand_address): Use ADDR_QUERY_ANY.
+ (aarch64_regno_regclass): Handle predicate registers.
+ (aarch64_secondary_reload): Handle big-endian reloads of SVE
+ data modes.
+ (aarch64_class_max_nregs): Handle SVE modes and predicate registers.
+ (aarch64_rtx_costs): Check for ADDVL and ADDPL instructions.
+ (aarch64_convert_sve_vector_bits): New function.
+ (aarch64_override_options): Use it to handle -msve-vector-bits=.
+ (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
+ rather than an rtx.
+ (aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode.
+ Handle SVE vector and predicate modes. Accept VL-based constants
+ that need only one temporary register, and VL offsets that require
+ no temporary registers.
+ (aarch64_conditional_register_usage): Mark the predicate registers
+ as fixed if SVE isn't available.
+ (aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode.
+ Return true for SVE vector and predicate modes.
+ (aarch64_simd_container_mode): Take the number of bits as a poly_int64
+ rather than an unsigned int. Handle SVE modes.
+ (aarch64_preferred_simd_mode): Update call accordingly. Handle
+ SVE modes.
+ (aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR
+ if SVE is enabled.
+ (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
+ (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
+ (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
+ (aarch64_sve_float_mul_immediate_p): New functions.
+ (aarch64_sve_valid_immediate): New function.
+ (aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors.
+ Explicitly reject structure modes. Check for INDEX constants.
+ Handle PTRUE and PFALSE constants.
+ (aarch64_check_zero_based_sve_index_immediate): New function.
+ (aarch64_simd_imm_zero_p): Delete.
+ (aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for
+ vector modes. Accept constants in the range of CNT[BHWD].
+ (aarch64_simd_scalar_immediate_valid_for_move): Explicitly
+ ask for an Advanced SIMD mode.
+ (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions.
+ (aarch64_simd_vector_alignment): Handle SVE predicates.
+ (aarch64_vectorize_preferred_vector_alignment): New function.
+ (aarch64_simd_vector_alignment_reachable): Use it instead of
+ the vector size.
+ (aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p.
+ (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New
+ functions.
+ (MAX_VECT_LEN): Delete.
+ (expand_vec_perm_d): Add a vec_flags field.
+ (emit_unspec2, aarch64_expand_sve_vec_perm): New functions.
+ (aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
+ (aarch64_evpc_ext): Don't apply a big-endian lane correction
+ for SVE modes.
+ (aarch64_evpc_rev): Rename to...
+ (aarch64_evpc_rev_local): ...this. Use a predicated operation for SVE.
+ (aarch64_evpc_rev_global): New function.
+ (aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP.
+ (aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of
+ MAX_VECT_LEN.
+ (aarch64_evpc_sve_tbl): New function.
+ (aarch64_expand_vec_perm_const_1): Update after rename of
+ aarch64_evpc_rev. Handle SVE permutes too, trying
+ aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather
+ than aarch64_evpc_tbl.
+ (aarch64_vectorize_vec_perm_const): Initialize vec_flags.
+ (aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code)
+ (aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int)
+ (aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or)
+ (aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float)
+ (aarch64_expand_sve_vcond): New functions.
+ (aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead
+ of aarch64_vector_mode_p.
+ (aarch64_dwarf_poly_indeterminate_value): New function.
+ (aarch64_compute_pressure_classes): Likewise.
+ (aarch64_can_change_mode_class): Likewise.
+ (TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine.
+ (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise.
+ (TARGET_VECTORIZE_GET_MASK_MODE): Likewise.
+ (TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise.
+ (TARGET_COMPUTE_PRESSURE_CLASSES): Likewise.
+ (TARGET_CAN_CHANGE_MODE_CLASS): Likewise.
+ * config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr)
+ (Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New
+ constraints.
+ (Dn, Dl, Dr): Accept const as well as const_vector.
+ (Dz): Likewise. Compare against CONST0_RTX.
+ * config/aarch64/iterators.md: Refer to "Advanced SIMD" instead
+ of "vector" where appropriate.
+ (SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD)
+ (SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators.
+ (UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT)
+ (UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE)
+ (UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS)
+ (UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs.
+ (Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV)
+ (v_int_equiv): Extend to SVE modes.
+ (Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New
+ mode attributes.
+ (LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators.
+ (optab): Handle popcount, smin, smax, umin, umax, abs and sqrt.
+ (logical_nn, lr, sve_int_op, sve_fp_op): New code attributs.
+ (LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP)
+ (SVE_COND_FP_CMP): New int iterators.
+ (perm_hilo): Handle the new unpack unspecs.
+ (optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int
+ attributes.
+ * config/aarch64/predicates.md (aarch64_sve_cnt_immediate)
+ (aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate)
+ (aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand)
+ (aarch64_equality_operator, aarch64_constant_vector_operand)
+ (aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates.
+ (aarch64_sve_nonimmediate_operand): Likewise.
+ (aarch64_sve_general_operand): Likewise.
+ (aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise.
+ (aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate)
+ (aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise.
+ (aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise.
+ (aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise.
+ (aarch64_sve_float_arith_immediate): Likewise.
+ (aarch64_sve_float_arith_with_sub_immediate): Likewise.
+ (aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise.
+ (aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise.
+ (aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise.
+ (aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise.
+ (aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise.
+ (aarch64_sve_float_arith_operand): Likewise.
+ (aarch64_sve_float_arith_with_sub_operand): Likewise.
+ (aarch64_sve_float_mul_operand): Likewise.
+ (aarch64_sve_vec_perm_operand): Likewise.
+ (aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate.
+ (aarch64_mov_operand): Accept const_poly_int and const_vector.
+ (aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const
+ as well as const_vector.
+ (aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier
+ in file. Use CONST0_RTX and CONSTM1_RTX.
+ (aarch64_simd_or_scalar_imm_zero): Likewise. Add match_codes.
+ (aarch64_simd_reg_or_zero): Accept const as well as const_vector.
+ Use aarch64_simd_imm_zero.
+ * config/aarch64/aarch64-sve.md: New file.
+ * config/aarch64/aarch64.md: Include it.
+ (VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers.
+ (UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE)
+ (UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI)
+ (UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK)
+ (UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants.
+ (sve): New attribute.
+ (enabled): Disable instructions with the sve attribute unless
+ TARGET_SVE.
+ (movqi, movhi): Pass CONST_POLY_INT operaneds through
+ aarch64_expand_mov_immediate.
+ (*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle
+ CNT[BHSD] immediates.
+ (movti): Split CONST_POLY_INT moves into two halves.
+ (add<mode>3): Accept aarch64_pluslong_or_poly_operand.
+ Split additions that need a temporary here if the destination
+ is the stack pointer.
+ (*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates.
+ (*add<mode>3_poly_1): New instruction.
+ (set_clobber_cc): New expander.
+
+2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
* simplify-rtx.c (simplify_immed_subreg): Add an inner_bytes
parameter and use it instead of GET_MODE_SIZE (innermode). Use
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index 172c30f..40c738c 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -136,6 +136,15 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
+ aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile);
+ cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS");
+ if (TARGET_SVE)
+ {
+ int bits;
+ if (!BITS_PER_SVE_VECTOR.is_constant (&bits))
+ bits = 0;
+ builtin_define_with_int_value ("__ARM_FEATURE_SVE_BITS", bits);
+ }
aarch64_def_or_undef (TARGET_AES, "__ARM_FEATURE_AES", pfile);
aarch64_def_or_undef (TARGET_SHA2, "__ARM_FEATURE_SHA2", pfile);
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index de40f72..4e9da29 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -30,6 +30,22 @@ FLOAT_MODE (HF, 2, 0);
ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
/* Vector modes. */
+
+VECTOR_BOOL_MODE (VNx16BI, 16, 2);
+VECTOR_BOOL_MODE (VNx8BI, 8, 2);
+VECTOR_BOOL_MODE (VNx4BI, 4, 2);
+VECTOR_BOOL_MODE (VNx2BI, 2, 2);
+
+ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
+ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
+ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
+ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
+
+ADJUST_ALIGNMENT (VNx16BI, 2);
+ADJUST_ALIGNMENT (VNx8BI, 2);
+ADJUST_ALIGNMENT (VNx4BI, 2);
+ADJUST_ALIGNMENT (VNx2BI, 2);
+
VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI. */
VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI. */
VECTOR_MODES (FLOAT, 8); /* V2SF. */
@@ -45,9 +61,43 @@ INT_MODE (OI, 32);
INT_MODE (CI, 48);
INT_MODE (XI, 64);
+/* Define SVE modes for NVECS vectors. VB, VH, VS and VD are the prefixes
+ for 8-bit, 16-bit, 32-bit and 64-bit elements respectively. It isn't
+ strictly necessary to set the alignment here, since the default would
+ be clamped to BIGGEST_ALIGNMENT anyhow, but it seems clearer. */
+#define SVE_MODES(NVECS, VB, VH, VS, VD) \
+ VECTOR_MODES_WITH_PREFIX (VNx, INT, 16 * NVECS); \
+ VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 16 * NVECS); \
+ \
+ ADJUST_NUNITS (VB##QI, aarch64_sve_vg * NVECS * 8); \
+ ADJUST_NUNITS (VH##HI, aarch64_sve_vg * NVECS * 4); \
+ ADJUST_NUNITS (VS##SI, aarch64_sve_vg * NVECS * 2); \
+ ADJUST_NUNITS (VD##DI, aarch64_sve_vg * NVECS); \
+ ADJUST_NUNITS (VH##HF, aarch64_sve_vg * NVECS * 4); \
+ ADJUST_NUNITS (VS##SF, aarch64_sve_vg * NVECS * 2); \
+ ADJUST_NUNITS (VD##DF, aarch64_sve_vg * NVECS); \
+ \
+ ADJUST_ALIGNMENT (VB##QI, 16); \
+ ADJUST_ALIGNMENT (VH##HI, 16); \
+ ADJUST_ALIGNMENT (VS##SI, 16); \
+ ADJUST_ALIGNMENT (VD##DI, 16); \
+ ADJUST_ALIGNMENT (VH##HF, 16); \
+ ADJUST_ALIGNMENT (VS##SF, 16); \
+ ADJUST_ALIGNMENT (VD##DF, 16);
+
+/* Give SVE vectors the names normally used for 256-bit vectors.
+ The actual number depends on command-line flags. */
+SVE_MODES (1, VNx16, VNx8, VNx4, VNx2)
+
/* Quad float: 128-bit floating mode for long doubles. */
FLOAT_MODE (TF, 16, ieee_quad_format);
+/* A 4-tuple of SVE vectors with the maximum -msve-vector-bits= setting.
+ Note that this is a limit only on the compile-time sizes of modes;
+ it is not a limit on the runtime sizes, since VL-agnostic code
+ must work with arbitary vector lengths. */
+#define MAX_BITSIZE_MODE_ANY_MODE (2048 * 4)
+
/* Coefficient 1 is multiplied by the number of 128-bit chunks in an
SVE vector (referred to as "VQ") minus one. */
#define NUM_POLY_INT_COEFFS 2
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 593dad9..5fe5e3f 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -39,16 +39,19 @@
that are required. Their order is not important. */
/* Enabling "fp" just enables "fp".
- Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2", "sha3", and sm3/sm4. */
+ Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2",
+ "sha3", sm3/sm4 and "sve". */
AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO |\
AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2 |\
- AARCH64_FL_SHA3 | AARCH64_FL_SM4, "fp")
+ AARCH64_FL_SHA3 | AARCH64_FL_SM4 | AARCH64_FL_SVE, "fp")
/* Enabling "simd" also enables "fp".
- Disabling "simd" also disables "crypto", "dotprod", "aes", "sha2", "sha3" and "sm3/sm4". */
+ Disabling "simd" also disables "crypto", "dotprod", "aes", "sha2", "sha3",
+ "sm3/sm4" and "sve". */
AARCH64_OPT_EXTENSION("simd", AARCH64_FL_SIMD, AARCH64_FL_FP, AARCH64_FL_CRYPTO |\
AARCH64_FL_DOTPROD | AARCH64_FL_AES | AARCH64_FL_SHA2 |\
- AARCH64_FL_SHA3 | AARCH64_FL_SM4, "asimd")
+ AARCH64_FL_SHA3 | AARCH64_FL_SM4 | AARCH64_FL_SVE,
+ "asimd")
/* Enabling "crypto" also enables "fp" and "simd".
Disabling "crypto" disables "crypto", "aes", "sha2", "sha3" and "sm3/sm4". */
@@ -63,8 +66,9 @@ AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32")
AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics")
/* Enabling "fp16" also enables "fp".
- Disabling "fp16" disables "fp16" and "fp16fml". */
-AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, AARCH64_FL_F16FML, "fphp asimdhp")
+ Disabling "fp16" disables "fp16", "fp16fml" and "sve". */
+AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP,
+ AARCH64_FL_F16FML | AARCH64_FL_SVE, "fphp asimdhp")
/* Enabling or disabling "rcpc" only changes "rcpc". */
AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc")
@@ -97,4 +101,8 @@ AARCH64_OPT_EXTENSION("sm4", AARCH64_FL_SM4, AARCH64_FL_SIMD, 0, "sm3 sm4")
Disabling "fp16fml" just disables "fp16fml". */
AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, AARCH64_FL_FP | AARCH64_FL_F16, 0, "asimdfml")
+/* Enabling "sve" also enables "fp16", "fp" and "simd".
+ Disabling "sve" just disables "sve". */
+AARCH64_OPT_EXTENSION("sve", AARCH64_FL_SVE, AARCH64_FL_FP | AARCH64_FL_SIMD | AARCH64_FL_F16, 0, "sve")
+
#undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
index 1992972..7a5c6d7 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -81,4 +81,14 @@ enum aarch64_function_type {
AARCH64_FUNCTION_ALL
};
+/* SVE vector register sizes. */
+enum aarch64_sve_vector_bits_enum {
+ SVE_SCALABLE,
+ SVE_128 = 128,
+ SVE_256 = 256,
+ SVE_512 = 512,
+ SVE_1024 = 1024,
+ SVE_2048 = 2048
+};
+
#endif
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 8c3471b..4f1fc15 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -118,10 +118,17 @@ enum aarch64_symbol_type
(the rules are the same for both).
ADDR_QUERY_LDP_STP
- Query what is valid for a load/store pair. */
+ Query what is valid for a load/store pair.
+
+ ADDR_QUERY_ANY
+ Query what is valid for at least one memory constraint, which may
+ allow things that "m" doesn't. For example, the SVE LDR and STR
+ addressing modes allow a wider range of immediate offsets than "m"
+ does. */
enum aarch64_addr_query_type {
ADDR_QUERY_M,
- ADDR_QUERY_LDP_STP
+ ADDR_QUERY_LDP_STP,
+ ADDR_QUERY_ANY
};
/* A set of tuning parameters contains references to size and time
@@ -344,6 +351,8 @@ int aarch64_branch_cost (bool, bool);
enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
+bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
+ HOST_WIDE_INT);
bool aarch64_constant_address_p (rtx);
bool aarch64_emit_approx_div (rtx, rtx, rtx);
bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
@@ -364,23 +373,41 @@ bool aarch64_legitimate_pic_operand_p (rtx);
bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
+opt_machine_mode aarch64_sve_pred_mode (unsigned int);
+bool aarch64_sve_cnt_immediate_p (rtx);
+bool aarch64_sve_addvl_addpl_immediate_p (rtx);
+bool aarch64_sve_inc_dec_immediate_p (rtx);
+int aarch64_add_offset_temporaries (rtx);
+void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx);
bool aarch64_mov_operand_p (rtx, machine_mode);
rtx aarch64_reverse_mask (machine_mode, unsigned int);
bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
+char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx);
+char *aarch64_output_sve_addvl_addpl (rtx, rtx, rtx);
+char *aarch64_output_sve_inc_dec_immediate (const char *, rtx);
char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
char *aarch64_output_simd_mov_immediate (rtx, unsigned,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
+char *aarch64_output_sve_mov_immediate (rtx);
+char *aarch64_output_ptrue (machine_mode, char);
bool aarch64_pad_reg_upward (machine_mode, const_tree, bool);
bool aarch64_regno_ok_for_base_p (int, bool);
bool aarch64_regno_ok_for_index_p (int, bool);
bool aarch64_reinterpret_float_as_int (rtx value, unsigned HOST_WIDE_INT *fail);
bool aarch64_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
bool high);
-bool aarch64_simd_imm_zero_p (rtx, machine_mode);
bool aarch64_simd_scalar_immediate_valid_for_move (rtx, scalar_int_mode);
bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
+rtx aarch64_check_zero_based_sve_index_immediate (rtx);
+bool aarch64_sve_index_immediate_p (rtx);
+bool aarch64_sve_arith_immediate_p (rtx, bool);
+bool aarch64_sve_bitmask_immediate_p (rtx);
+bool aarch64_sve_dup_immediate_p (rtx);
+bool aarch64_sve_cmp_immediate_p (rtx, bool);
+bool aarch64_sve_float_arith_immediate_p (rtx, bool);
+bool aarch64_sve_float_mul_immediate_p (rtx);
bool aarch64_split_dimode_const_store (rtx, rtx);
bool aarch64_symbolic_address_p (rtx);
bool aarch64_uimm12_shift (HOST_WIDE_INT);
@@ -388,7 +415,7 @@ bool aarch64_use_return_insn_p (void);
const char *aarch64_mangle_builtin_type (const_tree);
const char *aarch64_output_casesi (rtx *);
-enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx);
+enum aarch64_symbol_type aarch64_classify_symbol (rtx, HOST_WIDE_INT);
enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
enum reg_class aarch64_regno_regclass (unsigned);
int aarch64_asm_preferred_eh_data_format (int, int);
@@ -403,6 +430,8 @@ const char *aarch64_output_move_struct (rtx *operands);
rtx aarch64_return_addr (int, rtx);
rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
bool aarch64_simd_mem_operand_p (rtx);
+bool aarch64_sve_ld1r_operand_p (rtx);
+bool aarch64_sve_ldr_operand_p (rtx);
rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool);
rtx aarch64_tls_get_addr (void);
tree aarch64_fold_builtin (tree, int, tree *, bool);
@@ -414,7 +443,9 @@ const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
const char * aarch64_output_probe_stack_range (rtx, rtx);
void aarch64_err_no_fpadvsimd (machine_mode, const char *);
void aarch64_expand_epilogue (bool);
-void aarch64_expand_mov_immediate (rtx, rtx);
+void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0);
+void aarch64_emit_sve_pred_move (rtx, rtx, rtx);
+void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode);
void aarch64_expand_prologue (void);
void aarch64_expand_vector_init (rtx, rtx);
void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
@@ -467,6 +498,10 @@ void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
bool aarch64_gen_adjusted_ldpstp (rtx *, bool, scalar_mode, RTX_CODE);
+
+void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx);
+bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
+void aarch64_expand_sve_vcond (machine_mode, machine_mode, rtx *);
#endif /* RTX_CODE */
void aarch64_init_builtins (void);
@@ -485,6 +520,7 @@ tree aarch64_builtin_vectorized_function (unsigned int, tree, tree);
extern void aarch64_split_combinev16qi (rtx operands[3]);
extern void aarch64_expand_vec_perm (rtx, rtx, rtx, rtx, unsigned int);
+extern void aarch64_expand_sve_vec_perm (rtx, rtx, rtx, rtx);
extern bool aarch64_madd_needs_nop (rtx_insn *);
extern void aarch64_final_prescan_insn (rtx_insn *);
void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
@@ -508,4 +544,6 @@ std::string aarch64_get_extension_string_for_isa_flags (unsigned long,
rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
+poly_uint64 aarch64_regmode_natural_size (machine_mode);
+
#endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
new file mode 100644
index 0000000..352c306
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -0,0 +1,1922 @@
+;; Machine description for AArch64 SVE.
+;; Copyright (C) 2009-2016 Free Software Foundation, Inc.
+;; Contributed by ARM Ltd.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Note on the handling of big-endian SVE
+;; --------------------------------------
+;;
+;; On big-endian systems, Advanced SIMD mov<mode> patterns act in the
+;; same way as movdi or movti would: the first byte of memory goes
+;; into the most significant byte of the register and the last byte
+;; of memory goes into the least significant byte of the register.
+;; This is the most natural ordering for Advanced SIMD and matches
+;; the ABI layout for 64-bit and 128-bit vector types.
+;;
+;; As a result, the order of bytes within the register is what GCC
+;; expects for a big-endian target, and subreg offsets therefore work
+;; as expected, with the first element in memory having subreg offset 0
+;; and the last element in memory having the subreg offset associated
+;; with a big-endian lowpart. However, this ordering also means that
+;; GCC's lane numbering does not match the architecture's numbering:
+;; GCC always treats the element at the lowest address in memory
+;; (subreg offset 0) as element 0, while the architecture treats
+;; the least significant end of the register as element 0.
+;;
+;; The situation for SVE is different. We want the layout of the
+;; SVE register to be same for mov<mode> as it is for maskload<mode>:
+;; logically, a mov<mode> load must be indistinguishable from a
+;; maskload<mode> whose mask is all true. We therefore need the
+;; register layout to match LD1 rather than LDR. The ABI layout of
+;; SVE types also matches LD1 byte ordering rather than LDR byte ordering.
+;;
+;; As a result, the architecture lane numbering matches GCC's lane
+;; numbering, with element 0 always being the first in memory.
+;; However:
+;;
+;; - Applying a subreg offset to a register does not give the element
+;; that GCC expects: the first element in memory has the subreg offset
+;; associated with a big-endian lowpart while the last element in memory
+;; has subreg offset 0. We handle this via TARGET_CAN_CHANGE_MODE_CLASS.
+;;
+;; - We cannot use LDR and STR for spill slots that might be accessed
+;; via subregs, since although the elements have the order GCC expects,
+;; the order of the bytes within the elements is different. We instead
+;; access spill slots via LD1 and ST1, using secondary reloads to
+;; reserve a predicate register.
+
+
+;; SVE data moves.
+(define_expand "mov<mode>"
+ [(set (match_operand:SVE_ALL 0 "nonimmediate_operand")
+ (match_operand:SVE_ALL 1 "general_operand"))]
+ "TARGET_SVE"
+ {
+ /* Use the predicated load and store patterns where possible.
+ This is required for big-endian targets (see the comment at the
+ head of the file) and increases the addressing choices for
+ little-endian. */
+ if ((MEM_P (operands[0]) || MEM_P (operands[1]))
+ && can_create_pseudo_p ())
+ {
+ aarch64_expand_sve_mem_move (operands[0], operands[1], <VPRED>mode);
+ DONE;
+ }
+
+ if (CONSTANT_P (operands[1]))
+ {
+ aarch64_expand_mov_immediate (operands[0], operands[1],
+ gen_vec_duplicate<mode>);
+ DONE;
+ }
+ }
+)
+
+;; Unpredicated moves (little-endian). Only allow memory operations
+;; during and after RA; before RA we want the predicated load and
+;; store patterns to be used instead.
+(define_insn "*aarch64_sve_mov<mode>_le"
+ [(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w, Utr, w, w")
+ (match_operand:SVE_ALL 1 "aarch64_sve_general_operand" "Utr, w, w, Dn"))]
+ "TARGET_SVE
+ && !BYTES_BIG_ENDIAN
+ && ((lra_in_progress || reload_completed)
+ || (register_operand (operands[0], <MODE>mode)
+ && nonmemory_operand (operands[1], <MODE>mode)))"
+ "@
+ ldr\t%0, %1
+ str\t%1, %0
+ mov\t%0.d, %1.d
+ * return aarch64_output_sve_mov_immediate (operands[1]);"
+)
+
+;; Unpredicated moves (big-endian). Memory accesses require secondary
+;; reloads.
+(define_insn "*aarch64_sve_mov<mode>_be"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
+ (match_operand:SVE_ALL 1 "aarch64_nonmemory_operand" "w, Dn"))]
+ "TARGET_SVE && BYTES_BIG_ENDIAN"
+ "@
+ mov\t%0.d, %1.d
+ * return aarch64_output_sve_mov_immediate (operands[1]);"
+)
+
+;; Handle big-endian memory reloads. We use byte PTRUE for all modes
+;; to try to encourage reuse.
+(define_expand "aarch64_sve_reload_be"
+ [(parallel
+ [(set (match_operand 0)
+ (match_operand 1))
+ (clobber (match_operand:VNx16BI 2 "register_operand" "=Upl"))])]
+ "TARGET_SVE && BYTES_BIG_ENDIAN"
+ {
+ /* Create a PTRUE. */
+ emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode));
+
+ /* Refer to the PTRUE in the appropriate mode for this move. */
+ machine_mode mode = GET_MODE (operands[0]);
+ machine_mode pred_mode
+ = aarch64_sve_pred_mode (GET_MODE_UNIT_SIZE (mode)).require ();
+ rtx pred = gen_lowpart (pred_mode, operands[2]);
+
+ /* Emit a predicated load or store. */
+ aarch64_emit_sve_pred_move (operands[0], pred, operands[1]);
+ DONE;
+ }
+)
+
+;; A predicated load or store for which the predicate is known to be
+;; all-true. Note that this pattern is generated directly by
+;; aarch64_emit_sve_pred_move, so changes to this pattern will
+;; need changes there as well.
+(define_insn "*pred_mov<mode>"
+ [(set (match_operand:SVE_ALL 0 "nonimmediate_operand" "=w, m")
+ (unspec:SVE_ALL
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (match_operand:SVE_ALL 2 "nonimmediate_operand" "m, w")]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE
+ && (register_operand (operands[0], <MODE>mode)
+ || register_operand (operands[2], <MODE>mode))"
+ "@
+ ld1<Vesize>\t%0.<Vetype>, %1/z, %2
+ st1<Vesize>\t%2.<Vetype>, %1, %0"
+)
+
+(define_expand "movmisalign<mode>"
+ [(set (match_operand:SVE_ALL 0 "nonimmediate_operand")
+ (match_operand:SVE_ALL 1 "general_operand"))]
+ "TARGET_SVE"
+ {
+ /* Equivalent to a normal move for our purpooses. */
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+ }
+)
+
+(define_insn "maskload<mode><vpred>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL
+ [(match_operand:<VPRED> 2 "register_operand" "Upl")
+ (match_operand:SVE_ALL 1 "memory_operand" "m")]
+ UNSPEC_LD1_SVE))]
+ "TARGET_SVE"
+ "ld1<Vesize>\t%0.<Vetype>, %2/z, %1"
+)
+
+(define_insn "maskstore<mode><vpred>"
+ [(set (match_operand:SVE_ALL 0 "memory_operand" "+m")
+ (unspec:SVE_ALL [(match_operand:<VPRED> 2 "register_operand" "Upl")
+ (match_operand:SVE_ALL 1 "register_operand" "w")
+ (match_dup 0)]
+ UNSPEC_ST1_SVE))]
+ "TARGET_SVE"
+ "st1<Vesize>\t%1.<Vetype>, %2, %0"
+)
+
+(define_expand "mov<mode>"
+ [(set (match_operand:PRED_ALL 0 "nonimmediate_operand")
+ (match_operand:PRED_ALL 1 "general_operand"))]
+ "TARGET_SVE"
+ {
+ if (GET_CODE (operands[0]) == MEM)
+ operands[1] = force_reg (<MODE>mode, operands[1]);
+ }
+)
+
+(define_insn "*aarch64_sve_mov<mode>"
+ [(set (match_operand:PRED_ALL 0 "nonimmediate_operand" "=Upa, m, Upa, Upa, Upa")
+ (match_operand:PRED_ALL 1 "general_operand" "Upa, Upa, m, Dz, Dm"))]
+ "TARGET_SVE
+ && (register_operand (operands[0], <MODE>mode)
+ || register_operand (operands[1], <MODE>mode))"
+ "@
+ mov\t%0.b, %1.b
+ str\t%1, %0
+ ldr\t%0, %1
+ pfalse\t%0.b
+ * return aarch64_output_ptrue (<MODE>mode, '<Vetype>');"
+)
+
+;; Handle extractions from a predicate by converting to an integer vector
+;; and extracting from there.
+(define_expand "vec_extract<vpred><Vel>"
+ [(match_operand:<VEL> 0 "register_operand")
+ (match_operand:<VPRED> 1 "register_operand")
+ (match_operand:SI 2 "nonmemory_operand")
+ ;; Dummy operand to which we can attach the iterator.
+ (reg:SVE_I V0_REGNUM)]
+ "TARGET_SVE"
+ {
+ rtx tmp = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_aarch64_sve_dup<mode>_const (tmp, operands[1],
+ CONST1_RTX (<MODE>mode),
+ CONST0_RTX (<MODE>mode)));
+ emit_insn (gen_vec_extract<mode><Vel> (operands[0], tmp, operands[2]));
+ DONE;
+ }
+)
+
+(define_expand "vec_extract<mode><Vel>"
+ [(set (match_operand:<VEL> 0 "register_operand")
+ (vec_select:<VEL>
+ (match_operand:SVE_ALL 1 "register_operand")
+ (parallel [(match_operand:SI 2 "nonmemory_operand")])))]
+ "TARGET_SVE"
+ {
+ poly_int64 val;
+ if (poly_int_rtx_p (operands[2], &val)
+ && known_eq (val, GET_MODE_NUNITS (<MODE>mode) - 1))
+ {
+ /* The last element can be extracted with a LASTB and a false
+ predicate. */
+ rtx sel = force_reg (<VPRED>mode, CONST0_RTX (<VPRED>mode));
+ emit_insn (gen_aarch64_sve_lastb<mode> (operands[0], sel,
+ operands[1]));
+ DONE;
+ }
+ if (!CONST_INT_P (operands[2]))
+ {
+ /* Create an index with operand[2] as the base and -1 as the step.
+ It will then be zero for the element we care about. */
+ rtx index = gen_lowpart (<VEL_INT>mode, operands[2]);
+ index = force_reg (<VEL_INT>mode, index);
+ rtx series = gen_reg_rtx (<V_INT_EQUIV>mode);
+ emit_insn (gen_vec_series<v_int_equiv> (series, index, constm1_rtx));
+
+ /* Get a predicate that is true for only that element. */
+ rtx zero = CONST0_RTX (<V_INT_EQUIV>mode);
+ rtx cmp = gen_rtx_EQ (<V_INT_EQUIV>mode, series, zero);
+ rtx sel = gen_reg_rtx (<VPRED>mode);
+ emit_insn (gen_vec_cmp<v_int_equiv><vpred> (sel, cmp, series, zero));
+
+ /* Select the element using LASTB. */
+ emit_insn (gen_aarch64_sve_lastb<mode> (operands[0], sel,
+ operands[1]));
+ DONE;
+ }
+ }
+)
+
+;; Extract an element from the Advanced SIMD portion of the register.
+;; We don't just reuse the aarch64-simd.md pattern because we don't
+;; want any chnage in lane number on big-endian targets.
+(define_insn "*vec_extract<mode><Vel>_v128"
+ [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
+ (vec_select:<VEL>
+ (match_operand:SVE_ALL 1 "register_operand" "w, w, w")
+ (parallel [(match_operand:SI 2 "const_int_operand")])))]
+ "TARGET_SVE
+ && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 15)"
+ {
+ operands[1] = gen_lowpart (<V128>mode, operands[1]);
+ switch (which_alternative)
+ {
+ case 0:
+ return "umov\\t%<vwcore>0, %1.<Vetype>[%2]";
+ case 1:
+ return "dup\\t%<Vetype>0, %1.<Vetype>[%2]";
+ case 2:
+ return "st1\\t{%1.<Vetype>}[%2], %0";
+ default:
+ gcc_unreachable ();
+ }
+ }
+ [(set_attr "type" "neon_to_gp_q, neon_dup_q, neon_store1_one_lane_q")]
+)
+
+;; Extract an element in the range of DUP. This pattern allows the
+;; source and destination to be different.
+(define_insn "*vec_extract<mode><Vel>_dup"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (vec_select:<VEL>
+ (match_operand:SVE_ALL 1 "register_operand" "w")
+ (parallel [(match_operand:SI 2 "const_int_operand")])))]
+ "TARGET_SVE
+ && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 16, 63)"
+ {
+ operands[0] = gen_rtx_REG (<MODE>mode, REGNO (operands[0]));
+ return "dup\t%0.<Vetype>, %1.<Vetype>[%2]";
+ }
+)
+
+;; Extract an element outside the range of DUP. This pattern requires the
+;; source and destination to be the same.
+(define_insn "*vec_extract<mode><Vel>_ext"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (vec_select:<VEL>
+ (match_operand:SVE_ALL 1 "register_operand" "0")
+ (parallel [(match_operand:SI 2 "const_int_operand")])))]
+ "TARGET_SVE && INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode) >= 64"
+ {
+ operands[0] = gen_rtx_REG (<MODE>mode, REGNO (operands[0]));
+ operands[2] = GEN_INT (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode));
+ return "ext\t%0.b, %0.b, %0.b, #%2";
+ }
+)
+
+;; Extract the last active element of operand 1 into operand 0.
+;; If no elements are active, extract the last inactive element instead.
+(define_insn "aarch64_sve_lastb<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=r, w")
+ (unspec:<VEL>
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (match_operand:SVE_ALL 2 "register_operand" "w, w")]
+ UNSPEC_LASTB))]
+ "TARGET_SVE"
+ "@
+ lastb\t%<vwcore>0, %1, %2.<Vetype>
+ lastb\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+(define_expand "vec_duplicate<mode>"
+ [(parallel
+ [(set (match_operand:SVE_ALL 0 "register_operand")
+ (vec_duplicate:SVE_ALL
+ (match_operand:<VEL> 1 "aarch64_sve_dup_operand")))
+ (clobber (scratch:<VPRED>))])]
+ "TARGET_SVE"
+ {
+ if (MEM_P (operands[1]))
+ {
+ rtx ptrue = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ emit_insn (gen_sve_ld1r<mode> (operands[0], ptrue, operands[1],
+ CONST0_RTX (<MODE>mode)));
+ DONE;
+ }
+ }
+)
+
+;; Accept memory operands for the benefit of combine, and also in case
+;; the scalar input gets spilled to memory during RA. We want to split
+;; the load at the first opportunity in order to allow the PTRUE to be
+;; optimized with surrounding code.
+(define_insn_and_split "*vec_duplicate<mode>_reg"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w, w")
+ (vec_duplicate:SVE_ALL
+ (match_operand:<VEL> 1 "aarch64_sve_dup_operand" "r, w, Uty")))
+ (clobber (match_scratch:<VPRED> 2 "=X, X, Upl"))]
+ "TARGET_SVE"
+ "@
+ mov\t%0.<Vetype>, %<vwcore>1
+ mov\t%0.<Vetype>, %<Vetype>1
+ #"
+ "&& MEM_P (operands[1])"
+ [(const_int 0)]
+ {
+ if (GET_CODE (operands[2]) == SCRATCH)
+ operands[2] = gen_reg_rtx (<VPRED>mode);
+ emit_move_insn (operands[2], CONSTM1_RTX (<VPRED>mode));
+ emit_insn (gen_sve_ld1r<mode> (operands[0], operands[2], operands[1],
+ CONST0_RTX (<MODE>mode)));
+ DONE;
+ }
+ [(set_attr "length" "4,4,8")]
+)
+
+;; This is used for vec_duplicate<mode>s from memory, but can also
+;; be used by combine to optimize selects of a a vec_duplicate<mode>
+;; with zero.
+(define_insn "sve_ld1r<mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (vec_duplicate:SVE_ALL
+ (match_operand:<VEL> 2 "aarch64_sve_ld1r_operand" "Uty"))
+ (match_operand:SVE_ALL 3 "aarch64_simd_imm_zero")]
+ UNSPEC_SEL))]
+ "TARGET_SVE"
+ "ld1r<Vesize>\t%0.<Vetype>, %1/z, %2"
+)
+
+;; Load 128 bits from memory and duplicate to fill a vector. Since there
+;; are so few operations on 128-bit "elements", we don't define a VNx1TI
+;; and simply use vectors of bytes instead.
+(define_insn "sve_ld1rq"
+ [(set (match_operand:VNx16QI 0 "register_operand" "=w")
+ (unspec:VNx16QI
+ [(match_operand:VNx16BI 1 "register_operand" "Upl")
+ (match_operand:TI 2 "aarch64_sve_ld1r_operand" "Uty")]
+ UNSPEC_LD1RQ))]
+ "TARGET_SVE"
+ "ld1rqb\t%0.b, %1/z, %2"
+)
+
+;; Implement a predicate broadcast by shifting the low bit of the scalar
+;; input into the top bit and using a WHILELO. An alternative would be to
+;; duplicate the input and do a compare with zero.
+(define_expand "vec_duplicate<mode>"
+ [(set (match_operand:PRED_ALL 0 "register_operand")
+ (vec_duplicate:PRED_ALL (match_operand 1 "register_operand")))]
+ "TARGET_SVE"
+ {
+ rtx tmp = gen_reg_rtx (DImode);
+ rtx op1 = gen_lowpart (DImode, operands[1]);
+ emit_insn (gen_ashldi3 (tmp, op1, gen_int_mode (63, DImode)));
+ emit_insn (gen_while_ultdi<mode> (operands[0], const0_rtx, tmp));
+ DONE;
+ }
+)
+
+(define_insn "vec_series<mode>"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w")
+ (vec_series:SVE_I
+ (match_operand:<VEL> 1 "aarch64_sve_index_operand" "Usi, r, r")
+ (match_operand:<VEL> 2 "aarch64_sve_index_operand" "r, Usi, r")))]
+ "TARGET_SVE"
+ "@
+ index\t%0.<Vetype>, #%1, %<vw>2
+ index\t%0.<Vetype>, %<vw>1, #%2
+ index\t%0.<Vetype>, %<vw>1, %<vw>2"
+)
+
+;; Optimize {x, x, x, x, ...} + {0, n, 2*n, 3*n, ...} if n is in range
+;; of an INDEX instruction.
+(define_insn "*vec_series<mode>_plus"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w")
+ (plus:SVE_I
+ (vec_duplicate:SVE_I
+ (match_operand:<VEL> 1 "register_operand" "r"))
+ (match_operand:SVE_I 2 "immediate_operand")))]
+ "TARGET_SVE && aarch64_check_zero_based_sve_index_immediate (operands[2])"
+ {
+ operands[2] = aarch64_check_zero_based_sve_index_immediate (operands[2]);
+ return "index\t%0.<Vetype>, %<vw>1, #%2";
+ }
+)
+
+(define_expand "vec_perm<mode>"
+ [(match_operand:SVE_ALL 0 "register_operand")
+ (match_operand:SVE_ALL 1 "register_operand")
+ (match_operand:SVE_ALL 2 "register_operand")
+ (match_operand:<V_INT_EQUIV> 3 "aarch64_sve_vec_perm_operand")]
+ "TARGET_SVE && GET_MODE_NUNITS (<MODE>mode).is_constant ()"
+ {
+ aarch64_expand_sve_vec_perm (operands[0], operands[1],
+ operands[2], operands[3]);
+ DONE;
+ }
+)
+
+(define_insn "*aarch64_sve_tbl<mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL
+ [(match_operand:SVE_ALL 1 "register_operand" "w")
+ (match_operand:<V_INT_EQUIV> 2 "register_operand" "w")]
+ UNSPEC_TBL))]
+ "TARGET_SVE"
+ "tbl\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "*aarch64_sve_<perm_insn><perm_hilo><mode>"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (unspec:PRED_ALL [(match_operand:PRED_ALL 1 "register_operand" "Upa")
+ (match_operand:PRED_ALL 2 "register_operand" "Upa")]
+ PERMUTE))]
+ "TARGET_SVE"
+ "<perm_insn><perm_hilo>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "*aarch64_sve_<perm_insn><perm_hilo><mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")
+ (match_operand:SVE_ALL 2 "register_operand" "w")]
+ PERMUTE))]
+ "TARGET_SVE"
+ "<perm_insn><perm_hilo>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "*aarch64_sve_rev64<mode>"
+ [(set (match_operand:SVE_BHS 0 "register_operand" "=w")
+ (unspec:SVE_BHS
+ [(match_operand:VNx2BI 1 "register_operand" "Upl")
+ (unspec:SVE_BHS [(match_operand:SVE_BHS 2 "register_operand" "w")]
+ UNSPEC_REV64)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "rev<Vesize>\t%0.d, %1/m, %2.d"
+)
+
+(define_insn "*aarch64_sve_rev32<mode>"
+ [(set (match_operand:SVE_BH 0 "register_operand" "=w")
+ (unspec:SVE_BH
+ [(match_operand:VNx4BI 1 "register_operand" "Upl")
+ (unspec:SVE_BH [(match_operand:SVE_BH 2 "register_operand" "w")]
+ UNSPEC_REV32)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "rev<Vesize>\t%0.s, %1/m, %2.s"
+)
+
+(define_insn "*aarch64_sve_rev16vnx16qi"
+ [(set (match_operand:VNx16QI 0 "register_operand" "=w")
+ (unspec:VNx16QI
+ [(match_operand:VNx8BI 1 "register_operand" "Upl")
+ (unspec:VNx16QI [(match_operand:VNx16QI 2 "register_operand" "w")]
+ UNSPEC_REV16)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "revb\t%0.h, %1/m, %2.h"
+)
+
+(define_insn "*aarch64_sve_rev<mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")]
+ UNSPEC_REV))]
+ "TARGET_SVE"
+ "rev\t%0.<Vetype>, %1.<Vetype>")
+
+(define_insn "*aarch64_sve_dup_lane<mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (vec_duplicate:SVE_ALL
+ (vec_select:<VEL>
+ (match_operand:SVE_ALL 1 "register_operand" "w")
+ (parallel [(match_operand:SI 2 "const_int_operand")]))))]
+ "TARGET_SVE
+ && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 63)"
+ "dup\t%0.<Vetype>, %1.<Vetype>[%2]"
+)
+
+;; Note that the immediate (third) operand is the lane index not
+;; the byte index.
+(define_insn "*aarch64_sve_ext<mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "0")
+ (match_operand:SVE_ALL 2 "register_operand" "w")
+ (match_operand:SI 3 "const_int_operand")]
+ UNSPEC_EXT))]
+ "TARGET_SVE
+ && IN_RANGE (INTVAL (operands[3]) * GET_MODE_SIZE (<VEL>mode), 0, 255)"
+ {
+ operands[3] = GEN_INT (INTVAL (operands[3]) * GET_MODE_SIZE (<VEL>mode));
+ return "ext\\t%0.b, %0.b, %2.b, #%3";
+ }
+)
+
+(define_insn "add<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w, w")
+ (plus:SVE_I
+ (match_operand:SVE_I 1 "register_operand" "%0, 0, 0, w")
+ (match_operand:SVE_I 2 "aarch64_sve_add_operand" "vsa, vsn, vsi, w")))]
+ "TARGET_SVE"
+ "@
+ add\t%0.<Vetype>, %0.<Vetype>, #%D2
+ sub\t%0.<Vetype>, %0.<Vetype>, #%N2
+ * return aarch64_output_sve_inc_dec_immediate (\"%0.<Vetype>\", operands[2]);
+ add\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "sub<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (minus:SVE_I
+ (match_operand:SVE_I 1 "aarch64_sve_arith_operand" "w, vsa")
+ (match_operand:SVE_I 2 "register_operand" "w, 0")))]
+ "TARGET_SVE"
+ "@
+ sub\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>
+ subr\t%0.<Vetype>, %0.<Vetype>, #%D1"
+)
+
+;; Unpredicated multiplication.
+(define_expand "mul<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand")
+ (unspec:SVE_I
+ [(match_dup 3)
+ (mult:SVE_I
+ (match_operand:SVE_I 1 "register_operand")
+ (match_operand:SVE_I 2 "aarch64_sve_mul_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Multiplication predicated with a PTRUE. We don't actually need the
+;; predicate for the first alternative, but using Upa or X isn't likely
+;; to gain much and would make the instruction seem less uniform to the
+;; register allocator.
+(define_insn "*mul<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (mult:SVE_I
+ (match_operand:SVE_I 2 "register_operand" "%0, 0")
+ (match_operand:SVE_I 3 "aarch64_sve_mul_operand" "vsm, w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ mul\t%0.<Vetype>, %0.<Vetype>, #%3
+ mul\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+(define_insn "*madd<mode>"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (plus:SVE_I
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w")
+ (match_operand:SVE_I 3 "register_operand" "w, w"))]
+ UNSPEC_MERGE_PTRUE)
+ (match_operand:SVE_I 4 "register_operand" "w, 0")))]
+ "TARGET_SVE"
+ "@
+ mad\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
+ mla\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>"
+)
+
+(define_insn "*msub<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (minus:SVE_I
+ (match_operand:SVE_I 4 "register_operand" "w, 0")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w")
+ (match_operand:SVE_I 3 "register_operand" "w, w"))]
+ UNSPEC_MERGE_PTRUE)))]
+ "TARGET_SVE"
+ "@
+ msb\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
+ mls\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated NEG, NOT and POPCOUNT.
+(define_expand "<optab><mode>2"
+ [(set (match_operand:SVE_I 0 "register_operand")
+ (unspec:SVE_I
+ [(match_dup 2)
+ (SVE_INT_UNARY:SVE_I (match_operand:SVE_I 1 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; NEG, NOT and POPCOUNT predicated with a PTRUE.
+(define_insn "*<optab><mode>2"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (SVE_INT_UNARY:SVE_I
+ (match_operand:SVE_I 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
+)
+
+;; Vector AND, ORR and XOR.
+(define_insn "<optab><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (LOGICAL:SVE_I
+ (match_operand:SVE_I 1 "register_operand" "%0, w")
+ (match_operand:SVE_I 2 "aarch64_sve_logical_operand" "vsl, w")))]
+ "TARGET_SVE"
+ "@
+ <logical>\t%0.<Vetype>, %0.<Vetype>, #%C2
+ <logical>\t%0.d, %1.d, %2.d"
+)
+
+;; Vector AND, ORR and XOR on floating-point modes. We avoid subregs
+;; by providing this, but we need to use UNSPECs since rtx logical ops
+;; aren't defined for floating-point modes.
+(define_insn "*<optab><mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w")
+ (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand" "w")
+ (match_operand:SVE_F 2 "register_operand" "w")]
+ LOGICALF))]
+ "TARGET_SVE"
+ "<logicalf_op>\t%0.d, %1.d, %2.d"
+)
+
+;; REG_EQUAL notes on "not<mode>3" should ensure that we can generate
+;; this pattern even though the NOT instruction itself is predicated.
+(define_insn "bic<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w")
+ (and:SVE_I
+ (not:SVE_I (match_operand:SVE_I 1 "register_operand" "w"))
+ (match_operand:SVE_I 2 "register_operand" "w")))]
+ "TARGET_SVE"
+ "bic\t%0.d, %2.d, %1.d"
+)
+
+;; Predicate AND. We can reuse one of the inputs as the GP.
+(define_insn "and<mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa")
+ (match_operand:PRED_ALL 2 "register_operand" "Upa")))]
+ "TARGET_SVE"
+ "and\t%0.b, %1/z, %1.b, %2.b"
+)
+
+;; Unpredicated predicate ORR and XOR.
+(define_expand "<optab><mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand")
+ (and:PRED_ALL
+ (LOGICAL_OR:PRED_ALL
+ (match_operand:PRED_ALL 1 "register_operand")
+ (match_operand:PRED_ALL 2 "register_operand"))
+ (match_dup 3)))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
+ }
+)
+
+;; Predicated predicate ORR and XOR.
+(define_insn "pred_<optab><mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL
+ (LOGICAL:PRED_ALL
+ (match_operand:PRED_ALL 2 "register_operand" "Upa")
+ (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+ (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+ "TARGET_SVE"
+ "<logical>\t%0.b, %1/z, %2.b, %3.b"
+)
+
+;; Perform a logical operation on operands 2 and 3, using operand 1 as
+;; the GP (which is known to be a PTRUE). Store the result in operand 0
+;; and set the flags in the same way as for PTEST. The (and ...) in the
+;; UNSPEC_PTEST_PTRUE is logically redundant, but means that the tested
+;; value is structurally equivalent to rhs of the second set.
+(define_insn "*<optab><mode>3_cc"
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC
+ (unspec:SI [(match_operand:PRED_ALL 1 "register_operand" "Upa")
+ (and:PRED_ALL
+ (LOGICAL:PRED_ALL
+ (match_operand:PRED_ALL 2 "register_operand" "Upa")
+ (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+ (match_dup 1))]
+ UNSPEC_PTEST_PTRUE)
+ (const_int 0)))
+ (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3))
+ (match_dup 1)))]
+ "TARGET_SVE"
+ "<logical>s\t%0.b, %1/z, %2.b, %3.b"
+)
+
+;; Unpredicated predicate inverse.
+(define_expand "one_cmpl<mode>2"
+ [(set (match_operand:PRED_ALL 0 "register_operand")
+ (and:PRED_ALL
+ (not:PRED_ALL (match_operand:PRED_ALL 1 "register_operand"))
+ (match_dup 2)))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
+ }
+)
+
+;; Predicated predicate inverse.
+(define_insn "*one_cmpl<mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL
+ (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+ (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+ "TARGET_SVE"
+ "not\t%0.b, %1/z, %2.b"
+)
+
+;; Predicated predicate BIC and ORN.
+(define_insn "*<nlogical><mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL
+ (NLOGICAL:PRED_ALL
+ (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+ (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+ (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+ "TARGET_SVE"
+ "<nlogical>\t%0.b, %1/z, %3.b, %2.b"
+)
+
+;; Predicated predicate NAND and NOR.
+(define_insn "*<logical_nn><mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL
+ (NLOGICAL:PRED_ALL
+ (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+ (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa")))
+ (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+ "TARGET_SVE"
+ "<logical_nn>\t%0.b, %1/z, %2.b, %3.b"
+)
+
+;; Unpredicated LSL, LSR and ASR by a vector.
+(define_expand "v<optab><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand")
+ (unspec:SVE_I
+ [(match_dup 3)
+ (ASHIFT:SVE_I
+ (match_operand:SVE_I 1 "register_operand")
+ (match_operand:SVE_I 2 "aarch64_sve_<lr>shift_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; LSL, LSR and ASR by a vector, predicated with a PTRUE. We don't
+;; actually need the predicate for the first alternative, but using Upa
+;; or X isn't likely to gain much and would make the instruction seem
+;; less uniform to the register allocator.
+(define_insn "*v<optab><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (ASHIFT:SVE_I
+ (match_operand:SVE_I 2 "register_operand" "w, 0")
+ (match_operand:SVE_I 3 "aarch64_sve_<lr>shift_operand" "D<lr>, w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ <shift>\t%0.<Vetype>, %2.<Vetype>, #%3
+ <shift>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; LSL, LSR and ASR by a scalar, which expands into one of the vector
+;; shifts above.
+(define_expand "<ASHIFT:optab><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand")
+ (ASHIFT:SVE_I (match_operand:SVE_I 1 "register_operand")
+ (match_operand:<VEL> 2 "general_operand")))]
+ "TARGET_SVE"
+ {
+ rtx amount;
+ if (CONST_INT_P (operands[2]))
+ {
+ amount = gen_const_vec_duplicate (<MODE>mode, operands[2]);
+ if (!aarch64_sve_<lr>shift_operand (operands[2], <MODE>mode))
+ amount = force_reg (<MODE>mode, amount);
+ }
+ else
+ {
+ amount = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_vec_duplicate<mode> (amount,
+ convert_to_mode (<VEL>mode,
+ operands[2], 0)));
+ }
+ emit_insn (gen_v<optab><mode>3 (operands[0], operands[1], amount));
+ DONE;
+ }
+)
+
+;; Test all bits of operand 1. Operand 0 is a GP that is known to hold PTRUE.
+;;
+;; Using UNSPEC_PTEST_PTRUE allows combine patterns to assume that the GP
+;; is a PTRUE even if the optimizers haven't yet been able to propagate
+;; the constant. We would use a separate unspec code for PTESTs involving
+;; GPs that might not be PTRUEs.
+(define_insn "ptest_ptrue<mode>"
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC
+ (unspec:SI [(match_operand:PRED_ALL 0 "register_operand" "Upa")
+ (match_operand:PRED_ALL 1 "register_operand" "Upa")]
+ UNSPEC_PTEST_PTRUE)
+ (const_int 0)))]
+ "TARGET_SVE"
+ "ptest\t%0, %1.b"
+)
+
+;; Set element I of the result if operand1 + J < operand2 for all J in [0, I].
+;; with the comparison being unsigned.
+(define_insn "while_ult<GPI:mode><PRED_ALL:mode>"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (unspec:PRED_ALL [(match_operand:GPI 1 "aarch64_reg_or_zero" "rZ")
+ (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")]
+ UNSPEC_WHILE_LO))
+ (clobber (reg:CC CC_REGNUM))]
+ "TARGET_SVE"
+ "whilelo\t%0.<PRED_ALL:Vetype>, %<w>1, %<w>2"
+)
+
+;; WHILELO sets the flags in the same way as a PTEST with a PTRUE GP.
+;; Handle the case in which both results are useful. The GP operand
+;; to the PTEST isn't needed, so we allow it to be anything.
+(define_insn_and_split "while_ult<GPI:mode><PRED_ALL:mode>_cc"
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC
+ (unspec:SI [(match_operand:PRED_ALL 1)
+ (unspec:PRED_ALL
+ [(match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")
+ (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")]
+ UNSPEC_WHILE_LO)]
+ UNSPEC_PTEST_PTRUE)
+ (const_int 0)))
+ (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (unspec:PRED_ALL [(match_dup 2)
+ (match_dup 3)]
+ UNSPEC_WHILE_LO))]
+ "TARGET_SVE"
+ "whilelo\t%0.<PRED_ALL:Vetype>, %<w>2, %<w>3"
+ ;; Force the compiler to drop the unused predicate operand, so that we
+ ;; don't have an unnecessary PTRUE.
+ "&& !CONSTANT_P (operands[1])"
+ [(const_int 0)]
+ {
+ emit_insn (gen_while_ult<GPI:mode><PRED_ALL:mode>_cc
+ (operands[0], CONSTM1_RTX (<MODE>mode),
+ operands[2], operands[3]));
+ DONE;
+ }
+)
+
+;; Predicated integer comparison.
+(define_insn "*vec_cmp<cmp_op>_<mode>"
+ [(set (match_operand:<VPRED> 0 "register_operand" "=Upa, Upa")
+ (unspec:<VPRED>
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (match_operand:SVE_I 2 "register_operand" "w, w")
+ (match_operand:SVE_I 3 "aarch64_sve_cmp_<imm_con>_operand" "<imm_con>, w")]
+ SVE_COND_INT_CMP))
+ (clobber (reg:CC CC_REGNUM))]
+ "TARGET_SVE"
+ "@
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated integer comparison in which only the flags result is interesting.
+(define_insn "*vec_cmp<cmp_op>_<mode>_ptest"
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC
+ (unspec:SI
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (unspec:<VPRED>
+ [(match_dup 1)
+ (match_operand:SVE_I 2 "register_operand" "w, w")
+ (match_operand:SVE_I 3 "aarch64_sve_cmp_<imm_con>_operand" "<imm_con>, w")]
+ SVE_COND_INT_CMP)]
+ UNSPEC_PTEST_PTRUE)
+ (const_int 0)))
+ (clobber (match_scratch:<VPRED> 0 "=Upa, Upa"))]
+ "TARGET_SVE"
+ "@
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated comparison in which both the flag and predicate results
+;; are interesting.
+(define_insn "*vec_cmp<cmp_op>_<mode>_cc"
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC
+ (unspec:SI
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (unspec:<VPRED>
+ [(match_dup 1)
+ (match_operand:SVE_I 2 "register_operand" "w, w")
+ (match_operand:SVE_I 3 "aarch64_sve_cmp_<imm_con>_operand" "<imm_con>, w")]
+ SVE_COND_INT_CMP)]
+ UNSPEC_PTEST_PTRUE)
+ (const_int 0)))
+ (set (match_operand:<VPRED> 0 "register_operand" "=Upa, Upa")
+ (unspec:<VPRED>
+ [(match_dup 1)
+ (match_dup 2)
+ (match_dup 3)]
+ SVE_COND_INT_CMP))]
+ "TARGET_SVE"
+ "@
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated floating-point comparison (excluding FCMUO, which doesn't
+;; allow #0.0 as an operand).
+(define_insn "*vec_fcm<cmp_op><mode>"
+ [(set (match_operand:<VPRED> 0 "register_operand" "=Upa, Upa")
+ (unspec:<VPRED>
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (match_operand:SVE_F 2 "register_operand" "w, w")
+ (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "Dz, w")]
+ SVE_COND_FP_CMP))]
+ "TARGET_SVE"
+ "@
+ fcm<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #0.0
+ fcm<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated FCMUO.
+(define_insn "*vec_fcmuo<mode>"
+ [(set (match_operand:<VPRED> 0 "register_operand" "=Upa")
+ (unspec:<VPRED>
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_F 2 "register_operand" "w")
+ (match_operand:SVE_F 3 "register_operand" "w")]
+ UNSPEC_COND_UO))]
+ "TARGET_SVE"
+ "fcmuo\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; vcond_mask operand order: true, false, mask
+;; UNSPEC_SEL operand order: mask, true, false (as for VEC_COND_EXPR)
+;; SEL operand order: mask, true, false
+(define_insn "vcond_mask_<mode><vpred>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL
+ [(match_operand:<VPRED> 3 "register_operand" "Upa")
+ (match_operand:SVE_ALL 1 "register_operand" "w")
+ (match_operand:SVE_ALL 2 "register_operand" "w")]
+ UNSPEC_SEL))]
+ "TARGET_SVE"
+ "sel\t%0.<Vetype>, %3, %1.<Vetype>, %2.<Vetype>"
+)
+
+;; Selects between a duplicated immediate and zero.
+(define_insn "aarch64_sve_dup<mode>_const"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_I 2 "aarch64_sve_dup_immediate")
+ (match_operand:SVE_I 3 "aarch64_simd_imm_zero")]
+ UNSPEC_SEL))]
+ "TARGET_SVE"
+ "mov\t%0.<Vetype>, %1/z, #%2"
+)
+
+;; Integer (signed) vcond. Don't enforce an immediate range here, since it
+;; depends on the comparison; leave it to aarch64_expand_sve_vcond instead.
+(define_expand "vcond<mode><v_int_equiv>"
+ [(set (match_operand:SVE_ALL 0 "register_operand")
+ (if_then_else:SVE_ALL
+ (match_operator 3 "comparison_operator"
+ [(match_operand:<V_INT_EQUIV> 4 "register_operand")
+ (match_operand:<V_INT_EQUIV> 5 "nonmemory_operand")])
+ (match_operand:SVE_ALL 1 "register_operand")
+ (match_operand:SVE_ALL 2 "register_operand")))]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vcond (<MODE>mode, <V_INT_EQUIV>mode, operands);
+ DONE;
+ }
+)
+
+;; Integer vcondu. Don't enforce an immediate range here, since it
+;; depends on the comparison; leave it to aarch64_expand_sve_vcond instead.
+(define_expand "vcondu<mode><v_int_equiv>"
+ [(set (match_operand:SVE_ALL 0 "register_operand")
+ (if_then_else:SVE_ALL
+ (match_operator 3 "comparison_operator"
+ [(match_operand:<V_INT_EQUIV> 4 "register_operand")
+ (match_operand:<V_INT_EQUIV> 5 "nonmemory_operand")])
+ (match_operand:SVE_ALL 1 "register_operand")
+ (match_operand:SVE_ALL 2 "register_operand")))]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vcond (<MODE>mode, <V_INT_EQUIV>mode, operands);
+ DONE;
+ }
+)
+
+;; Floating-point vcond. All comparisons except FCMUO allow a zero
+;; operand; aarch64_expand_sve_vcond handles the case of an FCMUO
+;; with zero.
+(define_expand "vcond<mode><v_fp_equiv>"
+ [(set (match_operand:SVE_SD 0 "register_operand")
+ (if_then_else:SVE_SD
+ (match_operator 3 "comparison_operator"
+ [(match_operand:<V_FP_EQUIV> 4 "register_operand")
+ (match_operand:<V_FP_EQUIV> 5 "aarch64_simd_reg_or_zero")])
+ (match_operand:SVE_SD 1 "register_operand")
+ (match_operand:SVE_SD 2 "register_operand")))]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vcond (<MODE>mode, <V_FP_EQUIV>mode, operands);
+ DONE;
+ }
+)
+
+;; Signed integer comparisons. Don't enforce an immediate range here, since
+;; it depends on the comparison; leave it to aarch64_expand_sve_vec_cmp_int
+;; instead.
+(define_expand "vec_cmp<mode><vpred>"
+ [(parallel
+ [(set (match_operand:<VPRED> 0 "register_operand")
+ (match_operator:<VPRED> 1 "comparison_operator"
+ [(match_operand:SVE_I 2 "register_operand")
+ (match_operand:SVE_I 3 "nonmemory_operand")]))
+ (clobber (reg:CC CC_REGNUM))])]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vec_cmp_int (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3]);
+ DONE;
+ }
+)
+
+;; Unsigned integer comparisons. Don't enforce an immediate range here, since
+;; it depends on the comparison; leave it to aarch64_expand_sve_vec_cmp_int
+;; instead.
+(define_expand "vec_cmpu<mode><vpred>"
+ [(parallel
+ [(set (match_operand:<VPRED> 0 "register_operand")
+ (match_operator:<VPRED> 1 "comparison_operator"
+ [(match_operand:SVE_I 2 "register_operand")
+ (match_operand:SVE_I 3 "nonmemory_operand")]))
+ (clobber (reg:CC CC_REGNUM))])]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vec_cmp_int (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3]);
+ DONE;
+ }
+)
+
+;; Floating-point comparisons. All comparisons except FCMUO allow a zero
+;; operand; aarch64_expand_sve_vec_cmp_float handles the case of an FCMUO
+;; with zero.
+(define_expand "vec_cmp<mode><vpred>"
+ [(set (match_operand:<VPRED> 0 "register_operand")
+ (match_operator:<VPRED> 1 "comparison_operator"
+ [(match_operand:SVE_F 2 "register_operand")
+ (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero")]))]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vec_cmp_float (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3], false);
+ DONE;
+ }
+)
+
+;; Branch based on predicate equality or inequality.
+(define_expand "cbranch<mode>4"
+ [(set (pc)
+ (if_then_else
+ (match_operator 0 "aarch64_equality_operator"
+ [(match_operand:PRED_ALL 1 "register_operand")
+ (match_operand:PRED_ALL 2 "aarch64_simd_reg_or_zero")])
+ (label_ref (match_operand 3 ""))
+ (pc)))]
+ ""
+ {
+ rtx ptrue = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
+ rtx pred;
+ if (operands[2] == CONST0_RTX (<MODE>mode))
+ pred = operands[1];
+ else
+ {
+ pred = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_pred_xor<mode>3 (pred, ptrue, operands[1],
+ operands[2]));
+ }
+ emit_insn (gen_ptest_ptrue<mode> (ptrue, pred));
+ operands[1] = gen_rtx_REG (CCmode, CC_REGNUM);
+ operands[2] = const0_rtx;
+ }
+)
+
+;; Unpredicated integer MIN/MAX.
+(define_expand "<su><maxmin><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand")
+ (unspec:SVE_I
+ [(match_dup 3)
+ (MAXMIN:SVE_I (match_operand:SVE_I 1 "register_operand")
+ (match_operand:SVE_I 2 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Integer MIN/MAX predicated with a PTRUE.
+(define_insn "*<su><maxmin><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (MAXMIN:SVE_I (match_operand:SVE_I 2 "register_operand" "%0")
+ (match_operand:SVE_I 3 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<su><maxmin>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated floating-point MIN/MAX.
+(define_expand "<su><maxmin><mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (FMAXMIN:SVE_F (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Floating-point MIN/MAX predicated with a PTRUE.
+(define_insn "*<su><maxmin><mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (FMAXMIN:SVE_F (match_operand:SVE_F 2 "register_operand" "%0")
+ (match_operand:SVE_F 3 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "f<maxmin>nm\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated fmin/fmax.
+(define_expand "<maxmin_uns><mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "register_operand")]
+ FMAXMIN_UNS)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; fmin/fmax predicated with a PTRUE.
+(define_insn "*<maxmin_uns><mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (unspec:SVE_F [(match_operand:SVE_F 2 "register_operand" "%0")
+ (match_operand:SVE_F 3 "register_operand" "w")]
+ FMAXMIN_UNS)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<maxmin_uns_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated integer add reduction.
+(define_expand "reduc_plus_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand")
+ (unspec:<VEL> [(match_dup 2)
+ (match_operand:SVE_I 1 "register_operand")]
+ UNSPEC_ADDV))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Predicated integer add reduction. The result is always 64-bits.
+(define_insn "*reduc_plus_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_I 2 "register_operand" "w")]
+ UNSPEC_ADDV))]
+ "TARGET_SVE"
+ "uaddv\t%d0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated floating-point add reduction.
+(define_expand "reduc_plus_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand")
+ (unspec:<VEL> [(match_dup 2)
+ (match_operand:SVE_F 1 "register_operand")]
+ UNSPEC_FADDV))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Predicated floating-point add reduction.
+(define_insn "*reduc_plus_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_F 2 "register_operand" "w")]
+ UNSPEC_FADDV))]
+ "TARGET_SVE"
+ "faddv\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated integer MIN/MAX reduction.
+(define_expand "reduc_<maxmin_uns>_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand")
+ (unspec:<VEL> [(match_dup 2)
+ (match_operand:SVE_I 1 "register_operand")]
+ MAXMINV))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Predicated integer MIN/MAX reduction.
+(define_insn "*reduc_<maxmin_uns>_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_I 2 "register_operand" "w")]
+ MAXMINV))]
+ "TARGET_SVE"
+ "<maxmin_uns_op>v\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated floating-point MIN/MAX reduction.
+(define_expand "reduc_<maxmin_uns>_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand")
+ (unspec:<VEL> [(match_dup 2)
+ (match_operand:SVE_F 1 "register_operand")]
+ FMAXMINV))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Predicated floating-point MIN/MAX reduction.
+(define_insn "*reduc_<maxmin_uns>_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_F 2 "register_operand" "w")]
+ FMAXMINV))]
+ "TARGET_SVE"
+ "<maxmin_uns_op>v\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated floating-point addition.
+(define_expand "add<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (plus:SVE_F
+ (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "aarch64_sve_float_arith_with_sub_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Floating-point addition predicated with a PTRUE.
+(define_insn "*add<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl, Upl")
+ (plus:SVE_F
+ (match_operand:SVE_F 2 "register_operand" "%0, 0, w")
+ (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fadd\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
+ fsub\t%0.<Vetype>, %1/m, %0.<Vetype>, #%N3
+ fadd\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated floating-point subtraction.
+(define_expand "sub<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (minus:SVE_F
+ (match_operand:SVE_F 1 "aarch64_sve_float_arith_operand")
+ (match_operand:SVE_F 2 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Floating-point subtraction predicated with a PTRUE.
+(define_insn "*sub<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl, Upl, Upl")
+ (minus:SVE_F
+ (match_operand:SVE_F 2 "aarch64_sve_float_arith_operand" "0, 0, vsA, w")
+ (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, 0, w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE
+ && (register_operand (operands[2], <MODE>mode)
+ || register_operand (operands[3], <MODE>mode))"
+ "@
+ fsub\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
+ fadd\t%0.<Vetype>, %1/m, %0.<Vetype>, #%N3
+ fsubr\t%0.<Vetype>, %1/m, %0.<Vetype>, #%2
+ fsub\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated floating-point multiplication.
+(define_expand "mul<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (mult:SVE_F
+ (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "aarch64_sve_float_mul_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Floating-point multiplication predicated with a PTRUE.
+(define_insn "*mul<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (mult:SVE_F
+ (match_operand:SVE_F 2 "register_operand" "%0, w")
+ (match_operand:SVE_F 3 "aarch64_sve_float_mul_operand" "vsM, w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fmul\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
+ fmul\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated fma (%0 = (%1 * %2) + %3).
+(define_expand "fma<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 4)
+ (fma:SVE_F (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "register_operand")
+ (match_operand:SVE_F 3 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; fma predicated with a PTRUE.
+(define_insn "*fma<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (fma:SVE_F (match_operand:SVE_F 3 "register_operand" "%0, w")
+ (match_operand:SVE_F 4 "register_operand" "w, w")
+ (match_operand:SVE_F 2 "register_operand" "w, 0"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fmad\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+ fmla\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated fnma (%0 = (-%1 * %2) + %3).
+(define_expand "fnma<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 4)
+ (fma:SVE_F (neg:SVE_F
+ (match_operand:SVE_F 1 "register_operand"))
+ (match_operand:SVE_F 2 "register_operand")
+ (match_operand:SVE_F 3 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; fnma predicated with a PTRUE.
+(define_insn "*fnma<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (fma:SVE_F (neg:SVE_F
+ (match_operand:SVE_F 3 "register_operand" "%0, w"))
+ (match_operand:SVE_F 4 "register_operand" "w, w")
+ (match_operand:SVE_F 2 "register_operand" "w, 0"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fmsb\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+ fmls\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated fms (%0 = (%1 * %2) - %3).
+(define_expand "fms<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 4)
+ (fma:SVE_F (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "register_operand")
+ (neg:SVE_F
+ (match_operand:SVE_F 3 "register_operand")))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; fms predicated with a PTRUE.
+(define_insn "*fms<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (fma:SVE_F (match_operand:SVE_F 3 "register_operand" "%0, w")
+ (match_operand:SVE_F 4 "register_operand" "w, w")
+ (neg:SVE_F
+ (match_operand:SVE_F 2 "register_operand" "w, 0")))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fnmsb\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+ fnmls\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated fnms (%0 = (-%1 * %2) - %3).
+(define_expand "fnms<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 4)
+ (fma:SVE_F (neg:SVE_F
+ (match_operand:SVE_F 1 "register_operand"))
+ (match_operand:SVE_F 2 "register_operand")
+ (neg:SVE_F
+ (match_operand:SVE_F 3 "register_operand")))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; fnms predicated with a PTRUE.
+(define_insn "*fnms<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (fma:SVE_F (neg:SVE_F
+ (match_operand:SVE_F 3 "register_operand" "%0, w"))
+ (match_operand:SVE_F 4 "register_operand" "w, w")
+ (neg:SVE_F
+ (match_operand:SVE_F 2 "register_operand" "w, 0")))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fnmad\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+ fnmla\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated floating-point division.
+(define_expand "div<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (div:SVE_F (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Floating-point division predicated with a PTRUE.
+(define_insn "*div<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (div:SVE_F (match_operand:SVE_F 2 "register_operand" "0, w")
+ (match_operand:SVE_F 3 "register_operand" "w, 0"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fdiv\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+ fdivr\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>"
+)
+
+;; Unpredicated FNEG, FABS and FSQRT.
+(define_expand "<optab><mode>2"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 2)
+ (SVE_FP_UNARY:SVE_F (match_operand:SVE_F 1 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; FNEG, FABS and FSQRT predicated with a PTRUE.
+(define_insn "*<optab><mode>2"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (SVE_FP_UNARY:SVE_F (match_operand:SVE_F 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
+)
+
+;; Unpredicated FRINTy.
+(define_expand "<frint_pattern><mode>2"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 2)
+ (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand")]
+ FRINT)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; FRINTy predicated with a PTRUE.
+(define_insn "*<frint_pattern><mode>2"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (unspec:SVE_F [(match_operand:SVE_F 2 "register_operand" "w")]
+ FRINT)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "frint<frint_suffix>\t%0.<Vetype>, %1/m, %2.<Vetype>"
+)
+
+;; Unpredicated conversion of floats to integers of the same size (HF to HI,
+;; SF to SI or DF to DI).
+(define_expand "<fix_trunc_optab><mode><v_int_equiv>2"
+ [(set (match_operand:<V_INT_EQUIV> 0 "register_operand")
+ (unspec:<V_INT_EQUIV>
+ [(match_dup 2)
+ (FIXUORS:<V_INT_EQUIV>
+ (match_operand:SVE_F 1 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Conversion of SF to DI, SI or HI, predicated with a PTRUE.
+(define_insn "*<fix_trunc_optab>v16hsf<mode>2"
+ [(set (match_operand:SVE_HSDI 0 "register_operand" "=w")
+ (unspec:SVE_HSDI
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (FIXUORS:SVE_HSDI
+ (match_operand:VNx8HF 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "fcvtz<su>\t%0.<Vetype>, %1/m, %2.h"
+)
+
+;; Conversion of SF to DI or SI, predicated with a PTRUE.
+(define_insn "*<fix_trunc_optab>vnx4sf<mode>2"
+ [(set (match_operand:SVE_SDI 0 "register_operand" "=w")
+ (unspec:SVE_SDI
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (FIXUORS:SVE_SDI
+ (match_operand:VNx4SF 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "fcvtz<su>\t%0.<Vetype>, %1/m, %2.s"
+)
+
+;; Conversion of DF to DI or SI, predicated with a PTRUE.
+(define_insn "*<fix_trunc_optab>vnx2df<mode>2"
+ [(set (match_operand:SVE_SDI 0 "register_operand" "=w")
+ (unspec:SVE_SDI
+ [(match_operand:VNx2BI 1 "register_operand" "Upl")
+ (FIXUORS:SVE_SDI
+ (match_operand:VNx2DF 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "fcvtz<su>\t%0.<Vetype>, %1/m, %2.d"
+)
+
+;; Unpredicated conversion of integers to floats of the same size
+;; (HI to HF, SI to SF or DI to DF).
+(define_expand "<optab><v_int_equiv><mode>2"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 2)
+ (FLOATUORS:SVE_F
+ (match_operand:<V_INT_EQUIV> 1 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Conversion of DI, SI or HI to the same number of HFs, predicated
+;; with a PTRUE.
+(define_insn "*<optab><mode>vnx8hf2"
+ [(set (match_operand:VNx8HF 0 "register_operand" "=w")
+ (unspec:VNx8HF
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (FLOATUORS:VNx8HF
+ (match_operand:SVE_HSDI 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<su_optab>cvtf\t%0.h, %1/m, %2.<Vetype>"
+)
+
+;; Conversion of DI or SI to the same number of SFs, predicated with a PTRUE.
+(define_insn "*<optab><mode>vnx4sf2"
+ [(set (match_operand:VNx4SF 0 "register_operand" "=w")
+ (unspec:VNx4SF
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (FLOATUORS:VNx4SF
+ (match_operand:SVE_SDI 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<su_optab>cvtf\t%0.s, %1/m, %2.<Vetype>"
+)
+
+;; Conversion of DI or SI to DF, predicated with a PTRUE.
+(define_insn "*<optab><mode>vnx2df2"
+ [(set (match_operand:VNx2DF 0 "register_operand" "=w")
+ (unspec:VNx2DF
+ [(match_operand:VNx2BI 1 "register_operand" "Upl")
+ (FLOATUORS:VNx2DF
+ (match_operand:SVE_SDI 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<su_optab>cvtf\t%0.d, %1/m, %2.<Vetype>"
+)
+
+;; Conversion of DFs to the same number of SFs, or SFs to the same number
+;; of HFs.
+(define_insn "*trunc<Vwide><mode>2"
+ [(set (match_operand:SVE_HSF 0 "register_operand" "=w")
+ (unspec:SVE_HSF
+ [(match_operand:<VWIDE_PRED> 1 "register_operand" "Upl")
+ (unspec:SVE_HSF
+ [(match_operand:<VWIDE> 2 "register_operand" "w")]
+ UNSPEC_FLOAT_CONVERT)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "fcvt\t%0.<Vetype>, %1/m, %2.<Vewtype>"
+)
+
+;; Conversion of SFs to the same number of DFs, or HFs to the same number
+;; of SFs.
+(define_insn "*extend<mode><Vwide>2"
+ [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+ (unspec:<VWIDE>
+ [(match_operand:<VWIDE_PRED> 1 "register_operand" "Upl")
+ (unspec:<VWIDE>
+ [(match_operand:SVE_HSF 2 "register_operand" "w")]
+ UNSPEC_FLOAT_CONVERT)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "fcvt\t%0.<Vewtype>, %1/m, %2.<Vetype>"
+)
+
+;; PUNPKHI and PUNPKLO.
+(define_insn "vec_unpack<su>_<perm_hilo>_<mode>"
+ [(set (match_operand:<VWIDE> 0 "register_operand" "=Upa")
+ (unspec:<VWIDE> [(match_operand:PRED_BHS 1 "register_operand" "Upa")]
+ UNPACK))]
+ "TARGET_SVE"
+ "punpk<perm_hilo>\t%0.h, %1.b"
+)
+
+;; SUNPKHI, UUNPKHI, SUNPKLO and UUNPKLO.
+(define_insn "vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>"
+ [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+ (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand" "w")]
+ UNPACK))]
+ "TARGET_SVE"
+ "<su>unpk<perm_hilo>\t%0.<Vewtype>, %1.<Vetype>"
+)
+
+;; Used by the vec_unpacks_<perm_hilo>_<mode> expander to unpack the bit
+;; representation of a VNx4SF or VNx8HF without conversion. The choice
+;; between signed and unsigned isn't significant.
+(define_insn "*vec_unpacku_<perm_hilo>_<mode>_no_convert"
+ [(set (match_operand:SVE_HSF 0 "register_operand" "=w")
+ (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand" "w")]
+ UNPACK_UNSIGNED))]
+ "TARGET_SVE"
+ "uunpk<perm_hilo>\t%0.<Vewtype>, %1.<Vetype>"
+)
+
+;; Unpack one half of a VNx4SF to VNx2DF, or one half of a VNx8HF to VNx4SF.
+;; First unpack the source without conversion, then float-convert the
+;; unpacked source.
+(define_expand "vec_unpacks_<perm_hilo>_<mode>"
+ [(set (match_dup 2)
+ (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand")]
+ UNPACK_UNSIGNED))
+ (set (match_operand:<VWIDE> 0 "register_operand")
+ (unspec:<VWIDE> [(match_dup 3)
+ (unspec:<VWIDE> [(match_dup 2)] UNSPEC_FLOAT_CONVERT)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = gen_reg_rtx (<MODE>mode);
+ operands[3] = force_reg (<VWIDE_PRED>mode, CONSTM1_RTX (<VWIDE_PRED>mode));
+ }
+)
+
+;; Unpack one half of a VNx4SI to VNx2DF. First unpack from VNx4SI
+;; to VNx2DI, reinterpret the VNx2DI as a VNx4SI, then convert the
+;; unpacked VNx4SI to VNx2DF.
+(define_expand "vec_unpack<su_optab>_float_<perm_hilo>_vnx4si"
+ [(set (match_dup 2)
+ (unspec:VNx2DI [(match_operand:VNx4SI 1 "register_operand")]
+ UNPACK_UNSIGNED))
+ (set (match_operand:VNx2DF 0 "register_operand")
+ (unspec:VNx2DF [(match_dup 3)
+ (FLOATUORS:VNx2DF (match_dup 4))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = gen_reg_rtx (VNx2DImode);
+ operands[3] = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode));
+ operands[4] = gen_rtx_SUBREG (VNx4SImode, operands[2], 0);
+ }
+)
+
+;; Predicate pack. Use UZP1 on the narrower type, which discards
+;; the high part of each wide element.
+(define_insn "vec_pack_trunc_<Vwide>"
+ [(set (match_operand:PRED_BHS 0 "register_operand" "=Upa")
+ (unspec:PRED_BHS
+ [(match_operand:<VWIDE> 1 "register_operand" "Upa")
+ (match_operand:<VWIDE> 2 "register_operand" "Upa")]
+ UNSPEC_PACK))]
+ "TARGET_SVE"
+ "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+;; Integer pack. Use UZP1 on the narrower type, which discards
+;; the high part of each wide element.
+(define_insn "vec_pack_trunc_<Vwide>"
+ [(set (match_operand:SVE_BHSI 0 "register_operand" "=w")
+ (unspec:SVE_BHSI
+ [(match_operand:<VWIDE> 1 "register_operand" "w")
+ (match_operand:<VWIDE> 2 "register_operand" "w")]
+ UNSPEC_PACK))]
+ "TARGET_SVE"
+ "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+;; Convert two vectors of DF to SF, or two vectors of SF to HF, and pack
+;; the results into a single vector.
+(define_expand "vec_pack_trunc_<Vwide>"
+ [(set (match_dup 4)
+ (unspec:SVE_HSF
+ [(match_dup 3)
+ (unspec:SVE_HSF [(match_operand:<VWIDE> 1 "register_operand")]
+ UNSPEC_FLOAT_CONVERT)]
+ UNSPEC_MERGE_PTRUE))
+ (set (match_dup 5)
+ (unspec:SVE_HSF
+ [(match_dup 3)
+ (unspec:SVE_HSF [(match_operand:<VWIDE> 2 "register_operand")]
+ UNSPEC_FLOAT_CONVERT)]
+ UNSPEC_MERGE_PTRUE))
+ (set (match_operand:SVE_HSF 0 "register_operand")
+ (unspec:SVE_HSF [(match_dup 4) (match_dup 5)] UNSPEC_UZP1))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VWIDE_PRED>mode, CONSTM1_RTX (<VWIDE_PRED>mode));
+ operands[4] = gen_reg_rtx (<MODE>mode);
+ operands[5] = gen_reg_rtx (<MODE>mode);
+ }
+)
+
+;; Convert two vectors of DF to SI and pack the results into a single vector.
+(define_expand "vec_pack_<su>fix_trunc_vnx2df"
+ [(set (match_dup 4)
+ (unspec:VNx4SI
+ [(match_dup 3)
+ (FIXUORS:VNx4SI (match_operand:VNx2DF 1 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))
+ (set (match_dup 5)
+ (unspec:VNx4SI
+ [(match_dup 3)
+ (FIXUORS:VNx4SI (match_operand:VNx2DF 2 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))
+ (set (match_operand:VNx4SI 0 "register_operand")
+ (unspec:VNx4SI [(match_dup 4) (match_dup 5)] UNSPEC_UZP1))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode));
+ operands[4] = gen_reg_rtx (VNx4SImode);
+ operands[5] = gen_reg_rtx (VNx4SImode);
+ }
+)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ae44c2a..c5ed870 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -67,8 +67,10 @@
#include "sched-int.h"
#include "target-globals.h"
#include "common/common-target.h"
+#include "cfgrtl.h"
#include "selftest.h"
#include "selftest-rtl.h"
+#include "rtx-vector-builder.h"
/* This file should be included last. */
#include "target-def.h"
@@ -129,13 +131,18 @@ struct simd_immediate_info
simd_immediate_info (scalar_int_mode, unsigned HOST_WIDE_INT,
insn_type = MOV, modifier_type = LSL,
unsigned int = 0);
+ simd_immediate_info (scalar_mode, rtx, rtx);
/* The mode of the elements. */
scalar_mode elt_mode;
- /* The value of each element. */
+ /* The value of each element if all elements are the same, or the
+ first value if the constant is a series. */
rtx value;
+ /* The value of the step if the constant is a series, null otherwise. */
+ rtx step;
+
/* The instruction to use to move the immediate into a vector. */
insn_type insn;
@@ -149,7 +156,7 @@ struct simd_immediate_info
ELT_MODE_IN and value VALUE_IN. */
inline simd_immediate_info
::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in)
- : elt_mode (elt_mode_in), value (value_in), insn (MOV),
+ : elt_mode (elt_mode_in), value (value_in), step (NULL_RTX), insn (MOV),
modifier (LSL), shift (0)
{}
@@ -162,12 +169,23 @@ inline simd_immediate_info
insn_type insn_in, modifier_type modifier_in,
unsigned int shift_in)
: elt_mode (elt_mode_in), value (gen_int_mode (value_in, elt_mode_in)),
- insn (insn_in), modifier (modifier_in), shift (shift_in)
+ step (NULL_RTX), insn (insn_in), modifier (modifier_in), shift (shift_in)
+{}
+
+/* Construct an integer immediate in which each element has mode ELT_MODE_IN
+ and where element I is equal to VALUE_IN + I * STEP_IN. */
+inline simd_immediate_info
+::simd_immediate_info (scalar_mode elt_mode_in, rtx value_in, rtx step_in)
+ : elt_mode (elt_mode_in), value (value_in), step (step_in), insn (MOV),
+ modifier (LSL), shift (0)
{}
/* The current code model. */
enum aarch64_code_model aarch64_cmodel;
+/* The number of 64-bit elements in an SVE vector. */
+poly_uint16 aarch64_sve_vg;
+
#ifdef HAVE_AS_TLS
#undef TARGET_HAVE_TLS
#define TARGET_HAVE_TLS 1
@@ -187,8 +205,7 @@ static bool aarch64_builtin_support_vector_misalignment (machine_mode mode,
const_tree type,
int misalignment,
bool is_packed);
-static machine_mode
-aarch64_simd_container_mode (scalar_mode mode, unsigned width);
+static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64);
static bool aarch64_print_ldpstp_address (FILE *, machine_mode, rtx);
/* Major revision number of the ARM Architecture implemented by the target. */
@@ -1100,25 +1117,95 @@ aarch64_dbx_register_number (unsigned regno)
return AARCH64_DWARF_SP;
else if (FP_REGNUM_P (regno))
return AARCH64_DWARF_V0 + regno - V0_REGNUM;
+ else if (PR_REGNUM_P (regno))
+ return AARCH64_DWARF_P0 + regno - P0_REGNUM;
+ else if (regno == VG_REGNUM)
+ return AARCH64_DWARF_VG;
/* Return values >= DWARF_FRAME_REGISTERS indicate that there is no
equivalent DWARF register. */
return DWARF_FRAME_REGISTERS;
}
-/* Return TRUE if MODE is any of the large INT modes. */
+/* Return true if MODE is any of the Advanced SIMD structure modes. */
static bool
-aarch64_vect_struct_mode_p (machine_mode mode)
+aarch64_advsimd_struct_mode_p (machine_mode mode)
{
- return mode == OImode || mode == CImode || mode == XImode;
+ return (TARGET_SIMD
+ && (mode == OImode || mode == CImode || mode == XImode));
}
-/* Return TRUE if MODE is any of the vector modes. */
+/* Return true if MODE is an SVE predicate mode. */
static bool
-aarch64_vector_mode_p (machine_mode mode)
+aarch64_sve_pred_mode_p (machine_mode mode)
+{
+ return (TARGET_SVE
+ && (mode == VNx16BImode
+ || mode == VNx8BImode
+ || mode == VNx4BImode
+ || mode == VNx2BImode));
+}
+
+/* Three mutually-exclusive flags describing a vector or predicate type. */
+const unsigned int VEC_ADVSIMD = 1;
+const unsigned int VEC_SVE_DATA = 2;
+const unsigned int VEC_SVE_PRED = 4;
+/* Can be used in combination with VEC_ADVSIMD or VEC_SVE_DATA to indicate
+ a structure of 2, 3 or 4 vectors. */
+const unsigned int VEC_STRUCT = 8;
+/* Useful combinations of the above. */
+const unsigned int VEC_ANY_SVE = VEC_SVE_DATA | VEC_SVE_PRED;
+const unsigned int VEC_ANY_DATA = VEC_ADVSIMD | VEC_SVE_DATA;
+
+/* Return a set of flags describing the vector properties of mode MODE.
+ Ignore modes that are not supported by the current target. */
+static unsigned int
+aarch64_classify_vector_mode (machine_mode mode)
{
- return aarch64_vector_mode_supported_p (mode)
- || aarch64_vect_struct_mode_p (mode);
+ if (aarch64_advsimd_struct_mode_p (mode))
+ return VEC_ADVSIMD | VEC_STRUCT;
+
+ if (aarch64_sve_pred_mode_p (mode))
+ return VEC_SVE_PRED;
+
+ scalar_mode inner = GET_MODE_INNER (mode);
+ if (VECTOR_MODE_P (mode)
+ && (inner == QImode
+ || inner == HImode
+ || inner == HFmode
+ || inner == SImode
+ || inner == SFmode
+ || inner == DImode
+ || inner == DFmode))
+ {
+ if (TARGET_SVE
+ && known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR))
+ return VEC_SVE_DATA;
+
+ /* This includes V1DF but not V1DI (which doesn't exist). */
+ if (TARGET_SIMD
+ && (known_eq (GET_MODE_BITSIZE (mode), 64)
+ || known_eq (GET_MODE_BITSIZE (mode), 128)))
+ return VEC_ADVSIMD;
+ }
+
+ return 0;
+}
+
+/* Return true if MODE is any of the data vector modes, including
+ structure modes. */
+static bool
+aarch64_vector_data_mode_p (machine_mode mode)
+{
+ return aarch64_classify_vector_mode (mode) & VEC_ANY_DATA;
+}
+
+/* Return true if MODE is an SVE data vector mode; either a single vector
+ or a structure of vectors. */
+static bool
+aarch64_sve_data_mode_p (machine_mode mode)
+{
+ return aarch64_classify_vector_mode (mode) & VEC_SVE_DATA;
}
/* Implement target hook TARGET_ARRAY_MODE_SUPPORTED_P. */
@@ -1135,6 +1222,42 @@ aarch64_array_mode_supported_p (machine_mode mode,
return false;
}
+/* Return the SVE predicate mode to use for elements that have
+ ELEM_NBYTES bytes, if such a mode exists. */
+
+opt_machine_mode
+aarch64_sve_pred_mode (unsigned int elem_nbytes)
+{
+ if (TARGET_SVE)
+ {
+ if (elem_nbytes == 1)
+ return VNx16BImode;
+ if (elem_nbytes == 2)
+ return VNx8BImode;
+ if (elem_nbytes == 4)
+ return VNx4BImode;
+ if (elem_nbytes == 8)
+ return VNx2BImode;
+ }
+ return opt_machine_mode ();
+}
+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE. */
+
+static opt_machine_mode
+aarch64_get_mask_mode (poly_uint64 nunits, poly_uint64 nbytes)
+{
+ if (TARGET_SVE && known_eq (nbytes, BYTES_PER_SVE_VECTOR))
+ {
+ unsigned int elem_nbytes = vector_element_size (nbytes, nunits);
+ machine_mode pred_mode;
+ if (aarch64_sve_pred_mode (elem_nbytes).exists (&pred_mode))
+ return pred_mode;
+ }
+
+ return default_get_mask_mode (nunits, nbytes);
+}
+
/* Implement TARGET_HARD_REGNO_NREGS. */
static unsigned int
@@ -1149,7 +1272,14 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode mode)
{
case FP_REGS:
case FP_LO_REGS:
+ if (aarch64_sve_data_mode_p (mode))
+ return exact_div (GET_MODE_SIZE (mode),
+ BYTES_PER_SVE_VECTOR).to_constant ();
return CEIL (lowest_size, UNITS_PER_VREG);
+ case PR_REGS:
+ case PR_LO_REGS:
+ case PR_HI_REGS:
+ return 1;
default:
return CEIL (lowest_size, UNITS_PER_WORD);
}
@@ -1164,6 +1294,17 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
if (GET_MODE_CLASS (mode) == MODE_CC)
return regno == CC_REGNUM;
+ if (regno == VG_REGNUM)
+ /* This must have the same size as _Unwind_Word. */
+ return mode == DImode;
+
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ if (vec_flags & VEC_SVE_PRED)
+ return PR_REGNUM_P (regno);
+
+ if (PR_REGNUM_P (regno))
+ return 0;
+
if (regno == SP_REGNUM)
/* The purpose of comparing with ptr_mode is to support the
global register variable associated with the stack pointer
@@ -1173,15 +1314,15 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM)
return mode == Pmode;
- if (GP_REGNUM_P (regno) && ! aarch64_vect_struct_mode_p (mode))
+ if (GP_REGNUM_P (regno) && known_le (GET_MODE_SIZE (mode), 16))
return true;
if (FP_REGNUM_P (regno))
{
- if (aarch64_vect_struct_mode_p (mode))
+ if (vec_flags & VEC_STRUCT)
return end_hard_regno (mode, regno) - 1 <= V31_REGNUM;
else
- return true;
+ return !VECTOR_MODE_P (mode) || vec_flags != 0;
}
return false;
@@ -1197,10 +1338,39 @@ aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
}
+/* Implement REGMODE_NATURAL_SIZE. */
+poly_uint64
+aarch64_regmode_natural_size (machine_mode mode)
+{
+ /* The natural size for SVE data modes is one SVE data vector,
+ and similarly for predicates. We can't independently modify
+ anything smaller than that. */
+ /* ??? For now, only do this for variable-width SVE registers.
+ Doing it for constant-sized registers breaks lower-subreg.c. */
+ /* ??? And once that's fixed, we should probably have similar
+ code for Advanced SIMD. */
+ if (!aarch64_sve_vg.is_constant ())
+ {
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ if (vec_flags & VEC_SVE_PRED)
+ return BYTES_PER_SVE_PRED;
+ if (vec_flags & VEC_SVE_DATA)
+ return BYTES_PER_SVE_VECTOR;
+ }
+ return UNITS_PER_WORD;
+}
+
/* Implement HARD_REGNO_CALLER_SAVE_MODE. */
machine_mode
-aarch64_hard_regno_caller_save_mode (unsigned, unsigned, machine_mode mode)
-{
+aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
+ machine_mode mode)
+{
+ /* The predicate mode determines which bits are significant and
+ which are "don't care". Decreasing the number of lanes would
+ lose data while increasing the number of lanes would make bits
+ unnecessarily significant. */
+ if (PR_REGNUM_P (regno))
+ return mode;
if (known_ge (GET_MODE_SIZE (mode), 4))
return mode;
else
@@ -1886,6 +2056,200 @@ aarch64_force_temporary (machine_mode mode, rtx x, rtx value)
}
}
+/* Return true if we can move VALUE into a register using a single
+ CNT[BHWD] instruction. */
+
+static bool
+aarch64_sve_cnt_immediate_p (poly_int64 value)
+{
+ HOST_WIDE_INT factor = value.coeffs[0];
+ /* The coefficient must be [1, 16] * {2, 4, 8, 16}. */
+ return (value.coeffs[1] == factor
+ && IN_RANGE (factor, 2, 16 * 16)
+ && (factor & 1) == 0
+ && factor <= 16 * (factor & -factor));
+}
+
+/* Likewise for rtx X. */
+
+bool
+aarch64_sve_cnt_immediate_p (rtx x)
+{
+ poly_int64 value;
+ return poly_int_rtx_p (x, &value) && aarch64_sve_cnt_immediate_p (value);
+}
+
+/* Return the asm string for an instruction with a CNT-like vector size
+ operand (a vector pattern followed by a multiplier in the range [1, 16]).
+ PREFIX is the mnemonic without the size suffix and OPERANDS is the
+ first part of the operands template (the part that comes before the
+ vector size itself). FACTOR is the number of quadwords.
+ NELTS_PER_VQ, if nonzero, is the number of elements in each quadword.
+ If it is zero, we can use any element size. */
+
+static char *
+aarch64_output_sve_cnt_immediate (const char *prefix, const char *operands,
+ unsigned int factor,
+ unsigned int nelts_per_vq)
+{
+ static char buffer[sizeof ("sqincd\t%x0, %w0, all, mul #16")];
+
+ if (nelts_per_vq == 0)
+ /* There is some overlap in the ranges of the four CNT instructions.
+ Here we always use the smallest possible element size, so that the
+ multiplier is 1 whereever possible. */
+ nelts_per_vq = factor & -factor;
+ int shift = std::min (exact_log2 (nelts_per_vq), 4);
+ gcc_assert (IN_RANGE (shift, 1, 4));
+ char suffix = "dwhb"[shift - 1];
+
+ factor >>= shift;
+ unsigned int written;
+ if (factor == 1)
+ written = snprintf (buffer, sizeof (buffer), "%s%c\t%s",
+ prefix, suffix, operands);
+ else
+ written = snprintf (buffer, sizeof (buffer), "%s%c\t%s, all, mul #%d",
+ prefix, suffix, operands, factor);
+ gcc_assert (written < sizeof (buffer));
+ return buffer;
+}
+
+/* Return the asm string for an instruction with a CNT-like vector size
+ operand (a vector pattern followed by a multiplier in the range [1, 16]).
+ PREFIX is the mnemonic without the size suffix and OPERANDS is the
+ first part of the operands template (the part that comes before the
+ vector size itself). X is the value of the vector size operand,
+ as a polynomial integer rtx. */
+
+char *
+aarch64_output_sve_cnt_immediate (const char *prefix, const char *operands,
+ rtx x)
+{
+ poly_int64 value = rtx_to_poly_int64 (x);
+ gcc_assert (aarch64_sve_cnt_immediate_p (value));
+ return aarch64_output_sve_cnt_immediate (prefix, operands,
+ value.coeffs[1], 0);
+}
+
+/* Return true if we can add VALUE to a register using a single ADDVL
+ or ADDPL instruction. */
+
+static bool
+aarch64_sve_addvl_addpl_immediate_p (poly_int64 value)
+{
+ HOST_WIDE_INT factor = value.coeffs[0];
+ if (factor == 0 || value.coeffs[1] != factor)
+ return false;
+ /* FACTOR counts VG / 2, so a value of 2 is one predicate width
+ and a value of 16 is one vector width. */
+ return (((factor & 15) == 0 && IN_RANGE (factor, -32 * 16, 31 * 16))
+ || ((factor & 1) == 0 && IN_RANGE (factor, -32 * 2, 31 * 2)));
+}
+
+/* Likewise for rtx X. */
+
+bool
+aarch64_sve_addvl_addpl_immediate_p (rtx x)
+{
+ poly_int64 value;
+ return (poly_int_rtx_p (x, &value)
+ && aarch64_sve_addvl_addpl_immediate_p (value));
+}
+
+/* Return the asm string for adding ADDVL or ADDPL immediate X to operand 1
+ and storing the result in operand 0. */
+
+char *
+aarch64_output_sve_addvl_addpl (rtx dest, rtx base, rtx offset)
+{
+ static char buffer[sizeof ("addpl\t%x0, %x1, #-") + 3 * sizeof (int)];
+ poly_int64 offset_value = rtx_to_poly_int64 (offset);
+ gcc_assert (aarch64_sve_addvl_addpl_immediate_p (offset_value));
+
+ /* Use INC or DEC if possible. */
+ if (rtx_equal_p (dest, base) && GP_REGNUM_P (REGNO (dest)))
+ {
+ if (aarch64_sve_cnt_immediate_p (offset_value))
+ return aarch64_output_sve_cnt_immediate ("inc", "%x0",
+ offset_value.coeffs[1], 0);
+ if (aarch64_sve_cnt_immediate_p (-offset_value))
+ return aarch64_output_sve_cnt_immediate ("dec", "%x0",
+ -offset_value.coeffs[1], 0);
+ }
+
+ int factor = offset_value.coeffs[1];
+ if ((factor & 15) == 0)
+ snprintf (buffer, sizeof (buffer), "addvl\t%%x0, %%x1, #%d", factor / 16);
+ else
+ snprintf (buffer, sizeof (buffer), "addpl\t%%x0, %%x1, #%d", factor / 2);
+ return buffer;
+}
+
+/* Return true if X is a valid immediate for an SVE vector INC or DEC
+ instruction. If it is, store the number of elements in each vector
+ quadword in *NELTS_PER_VQ_OUT (if nonnull) and store the multiplication
+ factor in *FACTOR_OUT (if nonnull). */
+
+bool
+aarch64_sve_inc_dec_immediate_p (rtx x, int *factor_out,
+ unsigned int *nelts_per_vq_out)
+{
+ rtx elt;
+ poly_int64 value;
+
+ if (!const_vec_duplicate_p (x, &elt)
+ || !poly_int_rtx_p (elt, &value))
+ return false;
+
+ unsigned int nelts_per_vq = 128 / GET_MODE_UNIT_BITSIZE (GET_MODE (x));
+ if (nelts_per_vq != 8 && nelts_per_vq != 4 && nelts_per_vq != 2)
+ /* There's no vector INCB. */
+ return false;
+
+ HOST_WIDE_INT factor = value.coeffs[0];
+ if (value.coeffs[1] != factor)
+ return false;
+
+ /* The coefficient must be [1, 16] * NELTS_PER_VQ. */
+ if ((factor % nelts_per_vq) != 0
+ || !IN_RANGE (abs (factor), nelts_per_vq, 16 * nelts_per_vq))
+ return false;
+
+ if (factor_out)
+ *factor_out = factor;
+ if (nelts_per_vq_out)
+ *nelts_per_vq_out = nelts_per_vq;
+ return true;
+}
+
+/* Return true if X is a valid immediate for an SVE vector INC or DEC
+ instruction. */
+
+bool
+aarch64_sve_inc_dec_immediate_p (rtx x)
+{
+ return aarch64_sve_inc_dec_immediate_p (x, NULL, NULL);
+}
+
+/* Return the asm template for an SVE vector INC or DEC instruction.
+ OPERANDS gives the operands before the vector count and X is the
+ value of the vector count operand itself. */
+
+char *
+aarch64_output_sve_inc_dec_immediate (const char *operands, rtx x)
+{
+ int factor;
+ unsigned int nelts_per_vq;
+ if (!aarch64_sve_inc_dec_immediate_p (x, &factor, &nelts_per_vq))
+ gcc_unreachable ();
+ if (factor < 0)
+ return aarch64_output_sve_cnt_immediate ("dec", operands, -factor,
+ nelts_per_vq);
+ else
+ return aarch64_output_sve_cnt_immediate ("inc", operands, factor,
+ nelts_per_vq);
+}
static int
aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
@@ -2011,6 +2375,15 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
return num_insns;
}
+/* Return the number of temporary registers that aarch64_add_offset_1
+ would need to add OFFSET to a register. */
+
+static unsigned int
+aarch64_add_offset_1_temporaries (HOST_WIDE_INT offset)
+{
+ return abs_hwi (offset) < 0x1000000 ? 0 : 1;
+}
+
/* A subroutine of aarch64_add_offset. Set DEST to SRC + OFFSET for
a non-polynomial OFFSET. MODE is the mode of the addition.
FRAME_RELATED_P is true if the RTX_FRAME_RELATED flag should
@@ -2092,15 +2465,64 @@ aarch64_add_offset_1 (scalar_int_mode mode, rtx dest,
}
}
+/* Return the number of temporary registers that aarch64_add_offset
+ would need to move OFFSET into a register or add OFFSET to a register;
+ ADD_P is true if we want the latter rather than the former. */
+
+static unsigned int
+aarch64_offset_temporaries (bool add_p, poly_int64 offset)
+{
+ /* This follows the same structure as aarch64_add_offset. */
+ if (add_p && aarch64_sve_addvl_addpl_immediate_p (offset))
+ return 0;
+
+ unsigned int count = 0;
+ HOST_WIDE_INT factor = offset.coeffs[1];
+ HOST_WIDE_INT constant = offset.coeffs[0] - factor;
+ poly_int64 poly_offset (factor, factor);
+ if (add_p && aarch64_sve_addvl_addpl_immediate_p (poly_offset))
+ /* Need one register for the ADDVL/ADDPL result. */
+ count += 1;
+ else if (factor != 0)
+ {
+ factor = abs (factor);
+ if (factor > 16 * (factor & -factor))
+ /* Need one register for the CNT result and one for the multiplication
+ factor. If necessary, the second temporary can be reused for the
+ constant part of the offset. */
+ return 2;
+ /* Need one register for the CNT result (which might then
+ be shifted). */
+ count += 1;
+ }
+ return count + aarch64_add_offset_1_temporaries (constant);
+}
+
+/* If X can be represented as a poly_int64, return the number
+ of temporaries that are required to add it to a register.
+ Return -1 otherwise. */
+
+int
+aarch64_add_offset_temporaries (rtx x)
+{
+ poly_int64 offset;
+ if (!poly_int_rtx_p (x, &offset))
+ return -1;
+ return aarch64_offset_temporaries (true, offset);
+}
+
/* Set DEST to SRC + OFFSET. MODE is the mode of the addition.
FRAME_RELATED_P is true if the RTX_FRAME_RELATED flag should
be set and CFA adjustments added to the generated instructions.
TEMP1, if nonnull, is a register of mode MODE that can be used as a
temporary if register allocation is already complete. This temporary
- register may overlap DEST but must not overlap SRC. If TEMP1 is known
- to hold abs (OFFSET), EMIT_MOVE_IMM can be set to false to avoid emitting
- the immediate again.
+ register may overlap DEST if !FRAME_RELATED_P but must not overlap SRC.
+ If TEMP1 is known to hold abs (OFFSET), EMIT_MOVE_IMM can be set to
+ false to avoid emitting the immediate again.
+
+ TEMP2, if nonnull, is a second temporary register that doesn't
+ overlap either DEST or REG.
Since this function may be used to adjust the stack pointer, we must
ensure that it cannot cause transient stack deallocation (for example
@@ -2109,27 +2531,177 @@ aarch64_add_offset_1 (scalar_int_mode mode, rtx dest,
static void
aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src,
- poly_int64 offset, rtx temp1, bool frame_related_p,
- bool emit_move_imm = true)
+ poly_int64 offset, rtx temp1, rtx temp2,
+ bool frame_related_p, bool emit_move_imm = true)
{
gcc_assert (emit_move_imm || temp1 != NULL_RTX);
gcc_assert (temp1 == NULL_RTX || !reg_overlap_mentioned_p (temp1, src));
+ gcc_assert (temp1 == NULL_RTX
+ || !frame_related_p
+ || !reg_overlap_mentioned_p (temp1, dest));
+ gcc_assert (temp2 == NULL_RTX || !reg_overlap_mentioned_p (dest, temp2));
+
+ /* Try using ADDVL or ADDPL to add the whole value. */
+ if (src != const0_rtx && aarch64_sve_addvl_addpl_immediate_p (offset))
+ {
+ rtx offset_rtx = gen_int_mode (offset, mode);
+ rtx_insn *insn = emit_insn (gen_add3_insn (dest, src, offset_rtx));
+ RTX_FRAME_RELATED_P (insn) = frame_related_p;
+ return;
+ }
+
+ /* Coefficient 1 is multiplied by the number of 128-bit blocks in an
+ SVE vector register, over and above the minimum size of 128 bits.
+ This is equivalent to half the value returned by CNTD with a
+ vector shape of ALL. */
+ HOST_WIDE_INT factor = offset.coeffs[1];
+ HOST_WIDE_INT constant = offset.coeffs[0] - factor;
+
+ /* Try using ADDVL or ADDPL to add the VG-based part. */
+ poly_int64 poly_offset (factor, factor);
+ if (src != const0_rtx
+ && aarch64_sve_addvl_addpl_immediate_p (poly_offset))
+ {
+ rtx offset_rtx = gen_int_mode (poly_offset, mode);
+ if (frame_related_p)
+ {
+ rtx_insn *insn = emit_insn (gen_add3_insn (dest, src, offset_rtx));
+ RTX_FRAME_RELATED_P (insn) = true;
+ src = dest;
+ }
+ else
+ {
+ rtx addr = gen_rtx_PLUS (mode, src, offset_rtx);
+ src = aarch64_force_temporary (mode, temp1, addr);
+ temp1 = temp2;
+ temp2 = NULL_RTX;
+ }
+ }
+ /* Otherwise use a CNT-based sequence. */
+ else if (factor != 0)
+ {
+ /* Use a subtraction if we have a negative factor. */
+ rtx_code code = PLUS;
+ if (factor < 0)
+ {
+ factor = -factor;
+ code = MINUS;
+ }
+
+ /* Calculate CNTD * FACTOR / 2. First try to fold the division
+ into the multiplication. */
+ rtx val;
+ int shift = 0;
+ if (factor & 1)
+ /* Use a right shift by 1. */
+ shift = -1;
+ else
+ factor /= 2;
+ HOST_WIDE_INT low_bit = factor & -factor;
+ if (factor <= 16 * low_bit)
+ {
+ if (factor > 16 * 8)
+ {
+ /* "CNTB Xn, ALL, MUL #FACTOR" is out of range, so calculate
+ the value with the minimum multiplier and shift it into
+ position. */
+ int extra_shift = exact_log2 (low_bit);
+ shift += extra_shift;
+ factor >>= extra_shift;
+ }
+ val = gen_int_mode (poly_int64 (factor * 2, factor * 2), mode);
+ }
+ else
+ {
+ /* Use CNTD, then multiply it by FACTOR. */
+ val = gen_int_mode (poly_int64 (2, 2), mode);
+ val = aarch64_force_temporary (mode, temp1, val);
+
+ /* Go back to using a negative multiplication factor if we have
+ no register from which to subtract. */
+ if (code == MINUS && src == const0_rtx)
+ {
+ factor = -factor;
+ code = PLUS;
+ }
+ rtx coeff1 = gen_int_mode (factor, mode);
+ coeff1 = aarch64_force_temporary (mode, temp2, coeff1);
+ val = gen_rtx_MULT (mode, val, coeff1);
+ }
+
+ if (shift > 0)
+ {
+ /* Multiply by 1 << SHIFT. */
+ val = aarch64_force_temporary (mode, temp1, val);
+ val = gen_rtx_ASHIFT (mode, val, GEN_INT (shift));
+ }
+ else if (shift == -1)
+ {
+ /* Divide by 2. */
+ val = aarch64_force_temporary (mode, temp1, val);
+ val = gen_rtx_ASHIFTRT (mode, val, const1_rtx);
+ }
+
+ /* Calculate SRC +/- CNTD * FACTOR / 2. */
+ if (src != const0_rtx)
+ {
+ val = aarch64_force_temporary (mode, temp1, val);
+ val = gen_rtx_fmt_ee (code, mode, src, val);
+ }
+ else if (code == MINUS)
+ {
+ val = aarch64_force_temporary (mode, temp1, val);
+ val = gen_rtx_NEG (mode, val);
+ }
+
+ if (constant == 0 || frame_related_p)
+ {
+ rtx_insn *insn = emit_insn (gen_rtx_SET (dest, val));
+ if (frame_related_p)
+ {
+ RTX_FRAME_RELATED_P (insn) = true;
+ add_reg_note (insn, REG_CFA_ADJUST_CFA,
+ gen_rtx_SET (dest, plus_constant (Pmode, src,
+ poly_offset)));
+ }
+ src = dest;
+ if (constant == 0)
+ return;
+ }
+ else
+ {
+ src = aarch64_force_temporary (mode, temp1, val);
+ temp1 = temp2;
+ temp2 = NULL_RTX;
+ }
+
+ emit_move_imm = true;
+ }
- /* SVE support will go here. */
- HOST_WIDE_INT constant = offset.to_constant ();
aarch64_add_offset_1 (mode, dest, src, constant, temp1,
frame_related_p, emit_move_imm);
}
+/* Like aarch64_add_offset, but the offset is given as an rtx rather
+ than a poly_int64. */
+
+void
+aarch64_split_add_offset (scalar_int_mode mode, rtx dest, rtx src,
+ rtx offset_rtx, rtx temp1, rtx temp2)
+{
+ aarch64_add_offset (mode, dest, src, rtx_to_poly_int64 (offset_rtx),
+ temp1, temp2, false);
+}
+
/* Add DELTA to the stack pointer, marking the instructions frame-related.
TEMP1 is available as a temporary if nonnull. EMIT_MOVE_IMM is false
if TEMP1 already contains abs (DELTA). */
static inline void
-aarch64_add_sp (rtx temp1, poly_int64 delta, bool emit_move_imm)
+aarch64_add_sp (rtx temp1, rtx temp2, poly_int64 delta, bool emit_move_imm)
{
aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, delta,
- temp1, true, emit_move_imm);
+ temp1, temp2, true, emit_move_imm);
}
/* Subtract DELTA from the stack pointer, marking the instructions
@@ -2137,44 +2709,195 @@ aarch64_add_sp (rtx temp1, poly_int64 delta, bool emit_move_imm)
if nonnull. */
static inline void
-aarch64_sub_sp (rtx temp1, poly_int64 delta, bool frame_related_p)
+aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p)
{
aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, -delta,
- temp1, frame_related_p);
+ temp1, temp2, frame_related_p);
}
-void
-aarch64_expand_mov_immediate (rtx dest, rtx imm)
+/* Set DEST to (vec_series BASE STEP). */
+
+static void
+aarch64_expand_vec_series (rtx dest, rtx base, rtx step)
{
machine_mode mode = GET_MODE (dest);
+ scalar_mode inner = GET_MODE_INNER (mode);
+
+ /* Each operand can be a register or an immediate in the range [-16, 15]. */
+ if (!aarch64_sve_index_immediate_p (base))
+ base = force_reg (inner, base);
+ if (!aarch64_sve_index_immediate_p (step))
+ step = force_reg (inner, step);
+
+ emit_set_insn (dest, gen_rtx_VEC_SERIES (mode, base, step));
+}
+
+/* Try to duplicate SRC into SVE register DEST, given that SRC is an
+ integer of mode INT_MODE. Return true on success. */
+
+static bool
+aarch64_expand_sve_widened_duplicate (rtx dest, scalar_int_mode src_mode,
+ rtx src)
+{
+ /* If the constant is smaller than 128 bits, we can do the move
+ using a vector of SRC_MODEs. */
+ if (src_mode != TImode)
+ {
+ poly_uint64 count = exact_div (GET_MODE_SIZE (GET_MODE (dest)),
+ GET_MODE_SIZE (src_mode));
+ machine_mode dup_mode = mode_for_vector (src_mode, count).require ();
+ emit_move_insn (gen_lowpart (dup_mode, dest),
+ gen_const_vec_duplicate (dup_mode, src));
+ return true;
+ }
+
+ /* The bytes are loaded in little-endian order, so do a byteswap on
+ big-endian targets. */
+ if (BYTES_BIG_ENDIAN)
+ {
+ src = simplify_unary_operation (BSWAP, src_mode, src, src_mode);
+ if (!src)
+ return NULL_RTX;
+ }
+
+ /* Use LD1RQ to load the 128 bits from memory. */
+ src = force_const_mem (src_mode, src);
+ if (!src)
+ return false;
- gcc_assert (mode == SImode || mode == DImode);
+ /* Make sure that the address is legitimate. */
+ if (!aarch64_sve_ld1r_operand_p (src))
+ {
+ rtx addr = force_reg (Pmode, XEXP (src, 0));
+ src = replace_equiv_address (src, addr);
+ }
+
+ rtx ptrue = force_reg (VNx16BImode, CONSTM1_RTX (VNx16BImode));
+ emit_insn (gen_sve_ld1rq (gen_lowpart (VNx16QImode, dest), ptrue, src));
+ return true;
+}
+
+/* Expand a move of general CONST_VECTOR SRC into DEST, given that it
+ isn't a simple duplicate or series. */
+
+static void
+aarch64_expand_sve_const_vector (rtx dest, rtx src)
+{
+ machine_mode mode = GET_MODE (src);
+ unsigned int npatterns = CONST_VECTOR_NPATTERNS (src);
+ unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src);
+ gcc_assert (npatterns > 1);
+
+ if (nelts_per_pattern == 1)
+ {
+ /* The constant is a repeating seqeuence of at least two elements,
+ where the repeating elements occupy no more than 128 bits.
+ Get an integer representation of the replicated value. */
+ unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;
+ gcc_assert (int_bits <= 128);
+
+ scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
+ rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0);
+ if (int_value
+ && aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value))
+ return;
+ }
+
+ /* Expand each pattern individually. */
+ rtx_vector_builder builder;
+ auto_vec<rtx, 16> vectors (npatterns);
+ for (unsigned int i = 0; i < npatterns; ++i)
+ {
+ builder.new_vector (mode, 1, nelts_per_pattern);
+ for (unsigned int j = 0; j < nelts_per_pattern; ++j)
+ builder.quick_push (CONST_VECTOR_ELT (src, i + j * npatterns));
+ vectors.quick_push (force_reg (mode, builder.build ()));
+ }
+
+ /* Use permutes to interleave the separate vectors. */
+ while (npatterns > 1)
+ {
+ npatterns /= 2;
+ for (unsigned int i = 0; i < npatterns; ++i)
+ {
+ rtx tmp = (npatterns == 1 ? dest : gen_reg_rtx (mode));
+ rtvec v = gen_rtvec (2, vectors[i], vectors[i + npatterns]);
+ emit_set_insn (tmp, gen_rtx_UNSPEC (mode, v, UNSPEC_ZIP1));
+ vectors[i] = tmp;
+ }
+ }
+ gcc_assert (vectors[0] == dest);
+}
+
+/* Set DEST to immediate IMM. For SVE vector modes, GEN_VEC_DUPLICATE
+ is a pattern that can be used to set DEST to a replicated scalar
+ element. */
+
+void
+aarch64_expand_mov_immediate (rtx dest, rtx imm,
+ rtx (*gen_vec_duplicate) (rtx, rtx))
+{
+ machine_mode mode = GET_MODE (dest);
/* Check on what type of symbol it is. */
scalar_int_mode int_mode;
if ((GET_CODE (imm) == SYMBOL_REF
|| GET_CODE (imm) == LABEL_REF
- || GET_CODE (imm) == CONST)
+ || GET_CODE (imm) == CONST
+ || GET_CODE (imm) == CONST_POLY_INT)
&& is_a <scalar_int_mode> (mode, &int_mode))
{
- rtx mem, base, offset;
+ rtx mem;
+ poly_int64 offset;
+ HOST_WIDE_INT const_offset;
enum aarch64_symbol_type sty;
/* If we have (const (plus symbol offset)), separate out the offset
before we start classifying the symbol. */
- split_const (imm, &base, &offset);
+ rtx base = strip_offset (imm, &offset);
+
+ /* We must always add an offset involving VL separately, rather than
+ folding it into the relocation. */
+ if (!offset.is_constant (&const_offset))
+ {
+ if (base == const0_rtx && aarch64_sve_cnt_immediate_p (offset))
+ emit_insn (gen_rtx_SET (dest, imm));
+ else
+ {
+ /* Do arithmetic on 32-bit values if the result is smaller
+ than that. */
+ if (partial_subreg_p (int_mode, SImode))
+ {
+ /* It is invalid to do symbol calculations in modes
+ narrower than SImode. */
+ gcc_assert (base == const0_rtx);
+ dest = gen_lowpart (SImode, dest);
+ int_mode = SImode;
+ }
+ if (base != const0_rtx)
+ {
+ base = aarch64_force_temporary (int_mode, dest, base);
+ aarch64_add_offset (int_mode, dest, base, offset,
+ NULL_RTX, NULL_RTX, false);
+ }
+ else
+ aarch64_add_offset (int_mode, dest, base, offset,
+ dest, NULL_RTX, false);
+ }
+ return;
+ }
- sty = aarch64_classify_symbol (base, offset);
+ sty = aarch64_classify_symbol (base, const_offset);
switch (sty)
{
case SYMBOL_FORCE_TO_MEM:
- if (offset != const0_rtx
+ if (const_offset != 0
&& targetm.cannot_force_const_mem (int_mode, imm))
{
gcc_assert (can_create_pseudo_p ());
base = aarch64_force_temporary (int_mode, dest, base);
- aarch64_add_offset (int_mode, dest, base, INTVAL (offset),
- NULL_RTX, false);
+ aarch64_add_offset (int_mode, dest, base, const_offset,
+ NULL_RTX, NULL_RTX, false);
return;
}
@@ -2209,12 +2932,12 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
case SYMBOL_SMALL_GOT_4G:
case SYMBOL_TINY_GOT:
case SYMBOL_TINY_TLSIE:
- if (offset != const0_rtx)
+ if (const_offset != 0)
{
gcc_assert(can_create_pseudo_p ());
base = aarch64_force_temporary (int_mode, dest, base);
- aarch64_add_offset (int_mode, dest, base, INTVAL (offset),
- NULL_RTX, false);
+ aarch64_add_offset (int_mode, dest, base, const_offset,
+ NULL_RTX, NULL_RTX, false);
return;
}
/* FALLTHRU */
@@ -2235,13 +2958,36 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
if (!CONST_INT_P (imm))
{
- if (GET_CODE (imm) == HIGH)
+ rtx base, step, value;
+ if (GET_CODE (imm) == HIGH
+ || aarch64_simd_valid_immediate (imm, NULL))
emit_insn (gen_rtx_SET (dest, imm));
+ else if (const_vec_series_p (imm, &base, &step))
+ aarch64_expand_vec_series (dest, base, step);
+ else if (const_vec_duplicate_p (imm, &value))
+ {
+ /* If the constant is out of range of an SVE vector move,
+ load it from memory if we can, otherwise move it into
+ a register and use a DUP. */
+ scalar_mode inner_mode = GET_MODE_INNER (mode);
+ rtx op = force_const_mem (inner_mode, value);
+ if (!op)
+ op = force_reg (inner_mode, value);
+ else if (!aarch64_sve_ld1r_operand_p (op))
+ {
+ rtx addr = force_reg (Pmode, XEXP (op, 0));
+ op = replace_equiv_address (op, addr);
+ }
+ emit_insn (gen_vec_duplicate (dest, op));
+ }
+ else if (GET_CODE (imm) == CONST_VECTOR
+ && !GET_MODE_NUNITS (GET_MODE (imm)).is_constant ())
+ aarch64_expand_sve_const_vector (dest, imm);
else
- {
+ {
rtx mem = force_const_mem (mode, imm);
gcc_assert (mem);
- emit_insn (gen_rtx_SET (dest, mem));
+ emit_move_insn (dest, mem);
}
return;
@@ -2251,6 +2997,44 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
as_a <scalar_int_mode> (mode));
}
+/* Emit an SVE predicated move from SRC to DEST. PRED is a predicate
+ that is known to contain PTRUE. */
+
+void
+aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src)
+{
+ emit_insn (gen_rtx_SET (dest, gen_rtx_UNSPEC (GET_MODE (dest),
+ gen_rtvec (2, pred, src),
+ UNSPEC_MERGE_PTRUE)));
+}
+
+/* Expand a pre-RA SVE data move from SRC to DEST in which at least one
+ operand is in memory. In this case we need to use the predicated LD1
+ and ST1 instead of LDR and STR, both for correctness on big-endian
+ targets and because LD1 and ST1 support a wider range of addressing modes.
+ PRED_MODE is the mode of the predicate.
+
+ See the comment at the head of aarch64-sve.md for details about the
+ big-endian handling. */
+
+void
+aarch64_expand_sve_mem_move (rtx dest, rtx src, machine_mode pred_mode)
+{
+ machine_mode mode = GET_MODE (dest);
+ rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+ if (!register_operand (src, mode)
+ && !register_operand (dest, mode))
+ {
+ rtx tmp = gen_reg_rtx (mode);
+ if (MEM_P (src))
+ aarch64_emit_sve_pred_move (tmp, ptrue, src);
+ else
+ emit_move_insn (tmp, src);
+ src = tmp;
+ }
+ aarch64_emit_sve_pred_move (dest, ptrue, src);
+}
+
static bool
aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
tree exp ATTRIBUTE_UNUSED)
@@ -2715,6 +3499,21 @@ aarch64_function_arg_boundary (machine_mode mode, const_tree type)
return MIN (MAX (alignment, PARM_BOUNDARY), STACK_BOUNDARY);
}
+/* Implement TARGET_GET_RAW_RESULT_MODE and TARGET_GET_RAW_ARG_MODE. */
+
+static fixed_size_mode
+aarch64_get_reg_raw_mode (int regno)
+{
+ if (TARGET_SVE && FP_REGNUM_P (regno))
+ /* Don't use the SVE part of the register for __builtin_apply and
+ __builtin_return. The SVE registers aren't used by the normal PCS,
+ so using them there would be a waste of time. The PCS extensions
+ for SVE types are fundamentally incompatible with the
+ __builtin_return/__builtin_apply interface. */
+ return as_a <fixed_size_mode> (V16QImode);
+ return default_get_reg_raw_mode (regno);
+}
+
/* Implement TARGET_FUNCTION_ARG_PADDING.
Small aggregate types are placed in the lowest memory address.
@@ -3472,6 +4271,41 @@ aarch64_restore_callee_saves (machine_mode mode,
}
}
+/* Return true if OFFSET is a signed 4-bit value multiplied by the size
+ of MODE. */
+
+static inline bool
+offset_4bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
+{
+ HOST_WIDE_INT multiple;
+ return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, -8, 7));
+}
+
+/* Return true if OFFSET is a unsigned 6-bit value multiplied by the size
+ of MODE. */
+
+static inline bool
+offset_6bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
+{
+ HOST_WIDE_INT multiple;
+ return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, 0, 63));
+}
+
+/* Return true if OFFSET is a signed 7-bit value multiplied by the size
+ of MODE. */
+
+bool
+aarch64_offset_7bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
+{
+ HOST_WIDE_INT multiple;
+ return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, -64, 63));
+}
+
+/* Return true if OFFSET is a signed 9-bit value. */
+
static inline bool
offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
poly_int64 offset)
@@ -3481,20 +4315,26 @@ offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
&& IN_RANGE (const_offset, -256, 255));
}
+/* Return true if OFFSET is a signed 9-bit value multiplied by the size
+ of MODE. */
+
static inline bool
-offset_12bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
+offset_9bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
{
HOST_WIDE_INT multiple;
return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
- && IN_RANGE (multiple, 0, 4095));
+ && IN_RANGE (multiple, -256, 255));
}
-bool
-aarch64_offset_7bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
+/* Return true if OFFSET is an unsigned 12-bit value multiplied by the size
+ of MODE. */
+
+static inline bool
+offset_12bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
{
HOST_WIDE_INT multiple;
return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
- && IN_RANGE (multiple, -64, 63));
+ && IN_RANGE (multiple, 0, 4095));
}
/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */
@@ -3713,6 +4553,18 @@ aarch64_set_handled_components (sbitmap components)
cfun->machine->reg_is_wrapped_separately[regno] = true;
}
+/* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
+ is saved at BASE + OFFSET. */
+
+static void
+aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
+ rtx base, poly_int64 offset)
+{
+ rtx mem = gen_frame_mem (DImode, plus_constant (Pmode, base, offset));
+ add_reg_note (insn, REG_CFA_EXPRESSION,
+ gen_rtx_SET (mem, regno_reg_rtx[reg]));
+}
+
/* AArch64 stack frames generated by this compiler look like:
+-------------------------------+
@@ -3798,19 +4650,55 @@ aarch64_expand_prologue (void)
rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
- aarch64_sub_sp (ip0_rtx, initial_adjust, true);
+ aarch64_sub_sp (ip0_rtx, ip1_rtx, initial_adjust, true);
if (callee_adjust != 0)
aarch64_push_regs (reg1, reg2, callee_adjust);
if (emit_frame_chain)
{
+ poly_int64 reg_offset = callee_adjust;
if (callee_adjust == 0)
- aarch64_save_callee_saves (DImode, callee_offset, R29_REGNUM,
- R30_REGNUM, false);
+ {
+ reg1 = R29_REGNUM;
+ reg2 = R30_REGNUM;
+ reg_offset = callee_offset;
+ aarch64_save_callee_saves (DImode, reg_offset, reg1, reg2, false);
+ }
aarch64_add_offset (Pmode, hard_frame_pointer_rtx,
- stack_pointer_rtx, callee_offset, ip1_rtx,
- frame_pointer_needed);
+ stack_pointer_rtx, callee_offset,
+ ip1_rtx, ip0_rtx, frame_pointer_needed);
+ if (frame_pointer_needed && !frame_size.is_constant ())
+ {
+ /* Variable-sized frames need to describe the save slot
+ address using DW_CFA_expression rather than DW_CFA_offset.
+ This means that, without taking further action, the
+ locations of the registers that we've already saved would
+ remain based on the stack pointer even after we redefine
+ the CFA based on the frame pointer. We therefore need new
+ DW_CFA_expressions to re-express the save slots with addresses
+ based on the frame pointer. */
+ rtx_insn *insn = get_last_insn ();
+ gcc_assert (RTX_FRAME_RELATED_P (insn));
+
+ /* Add an explicit CFA definition if this was previously
+ implicit. */
+ if (!find_reg_note (insn, REG_CFA_ADJUST_CFA, NULL_RTX))
+ {
+ rtx src = plus_constant (Pmode, stack_pointer_rtx,
+ callee_offset);
+ add_reg_note (insn, REG_CFA_ADJUST_CFA,
+ gen_rtx_SET (hard_frame_pointer_rtx, src));
+ }
+
+ /* Change the save slot expressions for the registers that
+ we've already saved. */
+ reg_offset -= callee_offset;
+ aarch64_add_cfa_expression (insn, reg2, hard_frame_pointer_rtx,
+ reg_offset + UNITS_PER_WORD);
+ aarch64_add_cfa_expression (insn, reg1, hard_frame_pointer_rtx,
+ reg_offset);
+ }
emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
}
@@ -3818,7 +4706,7 @@ aarch64_expand_prologue (void)
callee_adjust != 0 || emit_frame_chain);
aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
callee_adjust != 0 || emit_frame_chain);
- aarch64_sub_sp (ip1_rtx, final_adjust, !frame_pointer_needed);
+ aarch64_sub_sp (ip1_rtx, ip0_rtx, final_adjust, !frame_pointer_needed);
}
/* Return TRUE if we can use a simple_return insn.
@@ -3859,6 +4747,13 @@ aarch64_expand_epilogue (bool for_sibcall)
unsigned reg2 = cfun->machine->frame.wb_candidate2;
rtx cfi_ops = NULL;
rtx_insn *insn;
+ /* A stack clash protection prologue may not have left IP0_REGNUM or
+ IP1_REGNUM in a usable state. The same is true for allocations
+ with an SVE component, since we then need both temporary registers
+ for each allocation. */
+ bool can_inherit_p = (initial_adjust.is_constant ()
+ && final_adjust.is_constant ()
+ && !flag_stack_clash_protection);
/* We need to add memory barrier to prevent read from deallocated stack. */
bool need_barrier_p
@@ -3884,9 +4779,10 @@ aarch64_expand_epilogue (bool for_sibcall)
is restored on the instruction doing the writeback. */
aarch64_add_offset (Pmode, stack_pointer_rtx,
hard_frame_pointer_rtx, -callee_offset,
- ip1_rtx, callee_adjust == 0);
+ ip1_rtx, ip0_rtx, callee_adjust == 0);
else
- aarch64_add_sp (ip1_rtx, final_adjust, df_regs_ever_live_p (IP1_REGNUM));
+ aarch64_add_sp (ip1_rtx, ip0_rtx, final_adjust,
+ !can_inherit_p || df_regs_ever_live_p (IP1_REGNUM));
aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
callee_adjust != 0, &cfi_ops);
@@ -3909,7 +4805,8 @@ aarch64_expand_epilogue (bool for_sibcall)
cfi_ops = NULL;
}
- aarch64_add_sp (ip0_rtx, initial_adjust, df_regs_ever_live_p (IP0_REGNUM));
+ aarch64_add_sp (ip0_rtx, ip1_rtx, initial_adjust,
+ !can_inherit_p || df_regs_ever_live_p (IP0_REGNUM));
if (cfi_ops)
{
@@ -4019,7 +4916,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
temp1 = gen_rtx_REG (Pmode, IP1_REGNUM);
if (vcall_offset == 0)
- aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, false);
+ aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, temp0, false);
else
{
gcc_assert ((vcall_offset & (POINTER_BYTES - 1)) == 0);
@@ -4031,8 +4928,8 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
addr = gen_rtx_PRE_MODIFY (Pmode, this_rtx,
plus_constant (Pmode, this_rtx, delta));
else
- aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1,
- false);
+ aarch64_add_offset (Pmode, this_rtx, this_rtx, delta,
+ temp1, temp0, false);
}
if (Pmode == ptr_mode)
@@ -4133,6 +5030,22 @@ aarch64_movw_imm (HOST_WIDE_INT val, scalar_int_mode mode)
|| (val & (((HOST_WIDE_INT) 0xffff) << 16)) == val);
}
+/* VAL is a value with the inner mode of MODE. Replicate it to fill a
+ 64-bit (DImode) integer. */
+
+static unsigned HOST_WIDE_INT
+aarch64_replicate_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mode)
+{
+ unsigned int size = GET_MODE_UNIT_PRECISION (mode);
+ while (size < 64)
+ {
+ val &= (HOST_WIDE_INT_1U << size) - 1;
+ val |= val << size;
+ size *= 2;
+ }
+ return val;
+}
+
/* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2. */
static const unsigned HOST_WIDE_INT bitmask_imm_mul[] =
@@ -4155,7 +5068,7 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode)
/* Check for a single sequence of one bits and return quickly if so.
The special cases of all ones and all zeroes returns false. */
- val = (unsigned HOST_WIDE_INT) val_in;
+ val = aarch64_replicate_bitmask_imm (val_in, mode);
tmp = val + (val & -val);
if (tmp == (tmp & -tmp))
@@ -4257,10 +5170,16 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
if (GET_CODE (x) == HIGH)
return true;
+ /* There's no way to calculate VL-based values using relocations. */
+ subrtx_iterator::array_type array;
+ FOR_EACH_SUBRTX (iter, array, x, ALL)
+ if (GET_CODE (*iter) == CONST_POLY_INT)
+ return true;
+
split_const (x, &base, &offset);
if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF)
{
- if (aarch64_classify_symbol (base, offset)
+ if (aarch64_classify_symbol (base, INTVAL (offset))
!= SYMBOL_FORCE_TO_MEM)
return true;
else
@@ -4496,10 +5415,21 @@ aarch64_classify_index (struct aarch64_address_info *info, rtx x,
&& contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (index))])
index = SUBREG_REG (index);
- if ((shift == 0
- || (shift > 0 && shift <= 3
- && known_eq (1 << shift, GET_MODE_SIZE (mode))))
- && REG_P (index)
+ if (aarch64_sve_data_mode_p (mode))
+ {
+ if (type != ADDRESS_REG_REG
+ || (1 << shift) != GET_MODE_UNIT_SIZE (mode))
+ return false;
+ }
+ else
+ {
+ if (shift != 0
+ && !(IN_RANGE (shift, 1, 3)
+ && known_eq (1 << shift, GET_MODE_SIZE (mode))))
+ return false;
+ }
+
+ if (REG_P (index)
&& aarch64_regno_ok_for_index_p (REGNO (index), strict_p))
{
info->type = type;
@@ -4552,23 +5482,34 @@ aarch64_classify_address (struct aarch64_address_info *info,
/* On BE, we use load/store pair for all large int mode load/stores.
TI/TFmode may also use a load/store pair. */
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ bool advsimd_struct_p = (vec_flags == (VEC_ADVSIMD | VEC_STRUCT));
bool load_store_pair_p = (type == ADDR_QUERY_LDP_STP
|| mode == TImode
|| mode == TFmode
- || (BYTES_BIG_ENDIAN
- && aarch64_vect_struct_mode_p (mode)));
+ || (BYTES_BIG_ENDIAN && advsimd_struct_p));
bool allow_reg_index_p = (!load_store_pair_p
- && (maybe_ne (GET_MODE_SIZE (mode), 16)
- || aarch64_vector_mode_supported_p (mode))
- && !aarch64_vect_struct_mode_p (mode));
+ && (known_lt (GET_MODE_SIZE (mode), 16)
+ || vec_flags == VEC_ADVSIMD
+ || vec_flags == VEC_SVE_DATA));
+
+ /* For SVE, only accept [Rn], [Rn, Rm, LSL #shift] and
+ [Rn, #offset, MUL VL]. */
+ if ((vec_flags & (VEC_SVE_DATA | VEC_SVE_PRED)) != 0
+ && (code != REG && code != PLUS))
+ return false;
/* On LE, for AdvSIMD, don't support anything other than POST_INC or
REG addressing. */
- if (aarch64_vect_struct_mode_p (mode) && !BYTES_BIG_ENDIAN
+ if (advsimd_struct_p
+ && !BYTES_BIG_ENDIAN
&& (code != POST_INC && code != REG))
return false;
+ gcc_checking_assert (GET_MODE (x) == VOIDmode
+ || SCALAR_INT_MODE_P (GET_MODE (x)));
+
switch (code)
{
case REG:
@@ -4641,6 +5582,17 @@ aarch64_classify_address (struct aarch64_address_info *info,
&& aarch64_offset_7bit_signed_scaled_p (TImode,
offset + 32));
+ /* Make "m" use the LD1 offset range for SVE data modes, so
+ that pre-RTL optimizers like ivopts will work to that
+ instead of the wider LDR/STR range. */
+ if (vec_flags == VEC_SVE_DATA)
+ return (type == ADDR_QUERY_M
+ ? offset_4bit_signed_scaled_p (mode, offset)
+ : offset_9bit_signed_scaled_p (mode, offset));
+
+ if (vec_flags == VEC_SVE_PRED)
+ return offset_9bit_signed_scaled_p (mode, offset);
+
if (load_store_pair_p)
return ((known_eq (GET_MODE_SIZE (mode), 4)
|| known_eq (GET_MODE_SIZE (mode), 8))
@@ -4741,7 +5693,8 @@ aarch64_classify_address (struct aarch64_address_info *info,
rtx sym, offs;
split_const (info->offset, &sym, &offs);
if (GET_CODE (sym) == SYMBOL_REF
- && (aarch64_classify_symbol (sym, offs) == SYMBOL_SMALL_ABSOLUTE))
+ && (aarch64_classify_symbol (sym, INTVAL (offs))
+ == SYMBOL_SMALL_ABSOLUTE))
{
/* The symbol and offset must be aligned to the access size. */
unsigned int align;
@@ -4812,7 +5765,7 @@ aarch64_classify_symbolic_expression (rtx x)
rtx offset;
split_const (x, &x, &offset);
- return aarch64_classify_symbol (x, offset);
+ return aarch64_classify_symbol (x, INTVAL (offset));
}
@@ -5265,6 +6218,33 @@ aarch64_const_vec_all_same_int_p (rtx x, HOST_WIDE_INT val)
return aarch64_const_vec_all_same_in_range_p (x, val, val);
}
+/* Return true if VEC is a constant in which every element is in the range
+ [MINVAL, MAXVAL]. The elements do not need to have the same value. */
+
+static bool
+aarch64_const_vec_all_in_range_p (rtx vec,
+ HOST_WIDE_INT minval,
+ HOST_WIDE_INT maxval)
+{
+ if (GET_CODE (vec) != CONST_VECTOR
+ || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
+ return false;
+
+ int nunits;
+ if (!CONST_VECTOR_STEPPED_P (vec))
+ nunits = const_vector_encoded_nelts (vec);
+ else if (!CONST_VECTOR_NUNITS (vec).is_constant (&nunits))
+ return false;
+
+ for (int i = 0; i < nunits; i++)
+ {
+ rtx vec_elem = CONST_VECTOR_ELT (vec, i);
+ if (!CONST_INT_P (vec_elem)
+ || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
+ return false;
+ }
+ return true;
+}
/* N Z C V. */
#define AARCH64_CC_V 1
@@ -5293,10 +6273,43 @@ static const int aarch64_nzcv_codes[] =
0 /* NV, Any. */
};
+/* Print floating-point vector immediate operand X to F, negating it
+ first if NEGATE is true. Return true on success, false if it isn't
+ a constant we can handle. */
+
+static bool
+aarch64_print_vector_float_operand (FILE *f, rtx x, bool negate)
+{
+ rtx elt;
+
+ if (!const_vec_duplicate_p (x, &elt))
+ return false;
+
+ REAL_VALUE_TYPE r = *CONST_DOUBLE_REAL_VALUE (elt);
+ if (negate)
+ r = real_value_negate (&r);
+
+ /* We only handle the SVE single-bit immediates here. */
+ if (real_equal (&r, &dconst0))
+ asm_fprintf (f, "0.0");
+ else if (real_equal (&r, &dconst1))
+ asm_fprintf (f, "1.0");
+ else if (real_equal (&r, &dconsthalf))
+ asm_fprintf (f, "0.5");
+ else
+ return false;
+
+ return true;
+}
+
/* Print operand X to file F in a target specific manner according to CODE.
The acceptable formatting commands given by CODE are:
'c': An integer or symbol address without a preceding #
sign.
+ 'C': Take the duplicated element in a vector constant
+ and print it in hex.
+ 'D': Take the duplicated element in a vector constant
+ and print it as an unsigned integer, in decimal.
'e': Print the sign/zero-extend size as a character 8->b,
16->h, 32->w.
'p': Prints N such that 2^N == X (X must be power of 2 and
@@ -5306,6 +6319,8 @@ static const int aarch64_nzcv_codes[] =
of regs.
'm': Print a condition (eq, ne, etc).
'M': Same as 'm', but invert condition.
+ 'N': Take the duplicated element in a vector constant
+ and print the negative of it in decimal.
'b/h/s/d/q': Print a scalar FP/SIMD register name.
'S/T/U/V': Print a FP/SIMD register name for a register list.
The register printed is the FP/SIMD register name
@@ -5332,6 +6347,7 @@ static const int aarch64_nzcv_codes[] =
static void
aarch64_print_operand (FILE *f, rtx x, int code)
{
+ rtx elt;
switch (code)
{
case 'c':
@@ -5448,6 +6464,25 @@ aarch64_print_operand (FILE *f, rtx x, int code)
}
break;
+ case 'N':
+ if (!const_vec_duplicate_p (x, &elt))
+ {
+ output_operand_lossage ("invalid vector constant");
+ return;
+ }
+
+ if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+ asm_fprintf (f, "%wd", -INTVAL (elt));
+ else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_FLOAT
+ && aarch64_print_vector_float_operand (f, x, true))
+ ;
+ else
+ {
+ output_operand_lossage ("invalid vector constant");
+ return;
+ }
+ break;
+
case 'b':
case 'h':
case 's':
@@ -5470,7 +6505,9 @@ aarch64_print_operand (FILE *f, rtx x, int code)
output_operand_lossage ("incompatible floating point / vector register operand for '%%%c'", code);
return;
}
- asm_fprintf (f, "v%d", REGNO (x) - V0_REGNUM + (code - 'S'));
+ asm_fprintf (f, "%c%d",
+ aarch64_sve_data_mode_p (GET_MODE (x)) ? 'z' : 'v',
+ REGNO (x) - V0_REGNUM + (code - 'S'));
break;
case 'R':
@@ -5491,6 +6528,33 @@ aarch64_print_operand (FILE *f, rtx x, int code)
asm_fprintf (f, "0x%wx", UINTVAL (x) & 0xffff);
break;
+ case 'C':
+ {
+ /* Print a replicated constant in hex. */
+ if (!const_vec_duplicate_p (x, &elt) || !CONST_INT_P (elt))
+ {
+ output_operand_lossage ("invalid operand for '%%%c'", code);
+ return;
+ }
+ scalar_mode inner_mode = GET_MODE_INNER (GET_MODE (x));
+ asm_fprintf (f, "0x%wx", UINTVAL (elt) & GET_MODE_MASK (inner_mode));
+ }
+ break;
+
+ case 'D':
+ {
+ /* Print a replicated constant in decimal, treating it as
+ unsigned. */
+ if (!const_vec_duplicate_p (x, &elt) || !CONST_INT_P (elt))
+ {
+ output_operand_lossage ("invalid operand for '%%%c'", code);
+ return;
+ }
+ scalar_mode inner_mode = GET_MODE_INNER (GET_MODE (x));
+ asm_fprintf (f, "%wd", UINTVAL (elt) & GET_MODE_MASK (inner_mode));
+ }
+ break;
+
case 'w':
case 'x':
if (x == const0_rtx
@@ -5524,14 +6588,16 @@ aarch64_print_operand (FILE *f, rtx x, int code)
switch (GET_CODE (x))
{
case REG:
- asm_fprintf (f, "%s", reg_names [REGNO (x)]);
+ if (aarch64_sve_data_mode_p (GET_MODE (x)))
+ asm_fprintf (f, "z%d", REGNO (x) - V0_REGNUM);
+ else
+ asm_fprintf (f, "%s", reg_names [REGNO (x)]);
break;
case MEM:
output_address (GET_MODE (x), XEXP (x, 0));
break;
- case CONST:
case LABEL_REF:
case SYMBOL_REF:
output_addr_const (asm_out_file, x);
@@ -5541,21 +6607,31 @@ aarch64_print_operand (FILE *f, rtx x, int code)
asm_fprintf (f, "%wd", INTVAL (x));
break;
- case CONST_VECTOR:
- if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+ case CONST:
+ if (!VECTOR_MODE_P (GET_MODE (x)))
{
- gcc_assert (
- aarch64_const_vec_all_same_in_range_p (x,
- HOST_WIDE_INT_MIN,
- HOST_WIDE_INT_MAX));
- asm_fprintf (f, "%wd", INTVAL (CONST_VECTOR_ELT (x, 0)));
+ output_addr_const (asm_out_file, x);
+ break;
}
- else if (aarch64_simd_imm_zero_p (x, GET_MODE (x)))
+ /* fall through */
+
+ case CONST_VECTOR:
+ if (!const_vec_duplicate_p (x, &elt))
{
- fputc ('0', f);
+ output_operand_lossage ("invalid vector constant");
+ return;
}
+
+ if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+ asm_fprintf (f, "%wd", INTVAL (elt));
+ else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_FLOAT
+ && aarch64_print_vector_float_operand (f, x, false))
+ ;
else
- gcc_unreachable ();
+ {
+ output_operand_lossage ("invalid vector constant");
+ return;
+ }
break;
case CONST_DOUBLE:
@@ -5740,6 +6816,22 @@ aarch64_print_address_internal (FILE *f, machine_mode mode, rtx x,
case ADDRESS_REG_IMM:
if (known_eq (addr.const_offset, 0))
asm_fprintf (f, "[%s]", reg_names [REGNO (addr.base)]);
+ else if (aarch64_sve_data_mode_p (mode))
+ {
+ HOST_WIDE_INT vnum
+ = exact_div (addr.const_offset,
+ BYTES_PER_SVE_VECTOR).to_constant ();
+ asm_fprintf (f, "[%s, #%wd, mul vl]",
+ reg_names[REGNO (addr.base)], vnum);
+ }
+ else if (aarch64_sve_pred_mode_p (mode))
+ {
+ HOST_WIDE_INT vnum
+ = exact_div (addr.const_offset,
+ BYTES_PER_SVE_PRED).to_constant ();
+ asm_fprintf (f, "[%s, #%wd, mul vl]",
+ reg_names[REGNO (addr.base)], vnum);
+ }
else
asm_fprintf (f, "[%s, %wd]", reg_names [REGNO (addr.base)],
INTVAL (addr.offset));
@@ -5827,7 +6919,7 @@ aarch64_print_ldpstp_address (FILE *f, machine_mode mode, rtx x)
static void
aarch64_print_operand_address (FILE *f, machine_mode mode, rtx x)
{
- if (!aarch64_print_address_internal (f, mode, x, ADDR_QUERY_M))
+ if (!aarch64_print_address_internal (f, mode, x, ADDR_QUERY_ANY))
output_addr_const (f, x);
}
@@ -5882,6 +6974,9 @@ aarch64_regno_regclass (unsigned regno)
if (FP_REGNUM_P (regno))
return FP_LO_REGNUM_P (regno) ? FP_LO_REGS : FP_REGS;
+ if (PR_REGNUM_P (regno))
+ return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
+
return NO_REGS;
}
@@ -6035,6 +7130,14 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x,
machine_mode mode,
secondary_reload_info *sri)
{
+ if (BYTES_BIG_ENDIAN
+ && reg_class_subset_p (rclass, FP_REGS)
+ && (MEM_P (x) || (REG_P (x) && !HARD_REGISTER_P (x)))
+ && aarch64_sve_data_mode_p (mode))
+ {
+ sri->icode = CODE_FOR_aarch64_sve_reload_be;
+ return NO_REGS;
+ }
/* If we have to disable direct literal pool loads and stores because the
function is too big, then we need a scratch register. */
@@ -6176,6 +7279,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
can hold MODE, but at the moment we need to handle all modes.
Just ignore any runtime parts for registers that can't store them. */
HOST_WIDE_INT lowest_size = constant_lower_bound (GET_MODE_SIZE (mode));
+ unsigned int nregs;
switch (regclass)
{
case CALLER_SAVE_REGS:
@@ -6185,10 +7289,17 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
case POINTER_AND_FP_REGS:
case FP_REGS:
case FP_LO_REGS:
- return (aarch64_vector_mode_p (mode)
+ if (aarch64_sve_data_mode_p (mode)
+ && constant_multiple_p (GET_MODE_SIZE (mode),
+ BYTES_PER_SVE_VECTOR, &nregs))
+ return nregs;
+ return (aarch64_vector_data_mode_p (mode)
? CEIL (lowest_size, UNITS_PER_VREG)
: CEIL (lowest_size, UNITS_PER_WORD));
case STACK_REG:
+ case PR_REGS:
+ case PR_LO_REGS:
+ case PR_HI_REGS:
return 1;
case NO_REGS:
@@ -7497,8 +8608,8 @@ cost_plus:
}
if (GET_MODE_CLASS (mode) == MODE_INT
- && CONST_INT_P (op1)
- && aarch64_uimm12_shift (INTVAL (op1)))
+ && ((CONST_INT_P (op1) && aarch64_uimm12_shift (INTVAL (op1)))
+ || aarch64_sve_addvl_addpl_immediate (op1, mode)))
{
*cost += rtx_cost (op0, mode, PLUS, 0, speed);
@@ -9415,6 +10526,21 @@ aarch64_get_arch (enum aarch64_arch arch)
return &all_architectures[cpu->arch];
}
+/* Return the VG value associated with -msve-vector-bits= value VALUE. */
+
+static poly_uint16
+aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits_enum value)
+{
+ /* For now generate vector-length agnostic code for -msve-vector-bits=128.
+ This ensures we can clearly distinguish SVE and Advanced SIMD modes when
+ deciding which .md file patterns to use and when deciding whether
+ something is a legitimate address or constant. */
+ if (value == SVE_SCALABLE || value == SVE_128)
+ return poly_uint16 (2, 2);
+ else
+ return (int) value / 64;
+}
+
/* Implement TARGET_OPTION_OVERRIDE. This is called once in the beginning
and is used to parse the -m{cpu,tune,arch} strings and setup the initial
tuning structs. In particular it must set selected_tune and
@@ -9516,6 +10642,9 @@ aarch64_override_options (void)
error ("assembler does not support -mabi=ilp32");
#endif
+ /* Convert -msve-vector-bits to a VG count. */
+ aarch64_sve_vg = aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits);
+
if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE && TARGET_ILP32)
sorry ("return address signing is only supported for -mabi=lp64");
@@ -10392,11 +11521,11 @@ aarch64_classify_tls_symbol (rtx x)
}
}
-/* Return the method that should be used to access SYMBOL_REF or
- LABEL_REF X. */
+/* Return the correct method for accessing X + OFFSET, where X is either
+ a SYMBOL_REF or LABEL_REF. */
enum aarch64_symbol_type
-aarch64_classify_symbol (rtx x, rtx offset)
+aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
{
if (GET_CODE (x) == LABEL_REF)
{
@@ -10439,7 +11568,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
resolve to a symbol in this module, then force to memory. */
if ((SYMBOL_REF_WEAK (x)
&& !aarch64_symbol_binds_local_p (x))
- || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+ || !IN_RANGE (offset, -1048575, 1048575))
return SYMBOL_FORCE_TO_MEM;
return SYMBOL_TINY_ABSOLUTE;
@@ -10448,7 +11577,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
4G. */
if ((SYMBOL_REF_WEAK (x)
&& !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
+ || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
HOST_WIDE_INT_C (4294967264)))
return SYMBOL_FORCE_TO_MEM;
return SYMBOL_SMALL_ABSOLUTE;
@@ -10511,28 +11640,46 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx x)
if (CONST_INT_P (x) || CONST_DOUBLE_P (x) || GET_CODE (x) == CONST_VECTOR)
return true;
- /* Do not allow vector struct mode constants. We could support
- 0 and -1 easily, but they need support in aarch64-simd.md. */
- if (aarch64_vect_struct_mode_p (mode))
+ /* Do not allow vector struct mode constants for Advanced SIMD.
+ We could support 0 and -1 easily, but they need support in
+ aarch64-simd.md. */
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT))
return false;
/* Do not allow wide int constants - this requires support in movti. */
if (CONST_WIDE_INT_P (x))
return false;
+ /* Only accept variable-length vector constants if they can be
+ handled directly.
+
+ ??? It would be possible to handle rematerialization of other
+ constants via secondary reloads. */
+ if (vec_flags & VEC_ANY_SVE)
+ return aarch64_simd_valid_immediate (x, NULL);
+
if (GET_CODE (x) == HIGH)
x = XEXP (x, 0);
- /* Do not allow const (plus (anchor_symbol, const_int)). */
- if (GET_CODE (x) == CONST)
- {
- rtx offset;
-
- split_const (x, &x, &offset);
+ /* Accept polynomial constants that can be calculated by using the
+ destination of a move as the sole temporary. Constants that
+ require a second temporary cannot be rematerialized (they can't be
+ forced to memory and also aren't legitimate constants). */
+ poly_int64 offset;
+ if (poly_int_rtx_p (x, &offset))
+ return aarch64_offset_temporaries (false, offset) <= 1;
+
+ /* If an offset is being added to something else, we need to allow the
+ base to be moved into the destination register, meaning that there
+ are no free temporaries for the offset. */
+ x = strip_offset (x, &offset);
+ if (!offset.is_constant () && aarch64_offset_temporaries (true, offset) > 0)
+ return false;
- if (SYMBOL_REF_P (x) && SYMBOL_REF_ANCHOR_P (x))
- return false;
- }
+ /* Do not allow const (plus (anchor_symbol, const_int)). */
+ if (maybe_ne (offset, 0) && SYMBOL_REF_P (x) && SYMBOL_REF_ANCHOR_P (x))
+ return false;
/* Treat symbols as constants. Avoid TLS symbols as they are complex,
so spilling them is better than rematerialization. */
@@ -11079,6 +12226,12 @@ aarch64_conditional_register_usage (void)
call_used_regs[i] = 1;
}
}
+ if (!TARGET_SVE)
+ for (i = P0_REGNUM; i <= P15_REGNUM; i++)
+ {
+ fixed_regs[i] = 1;
+ call_used_regs[i] = 1;
+ }
}
/* Walk down the type tree of TYPE counting consecutive base elements.
@@ -11372,28 +12525,40 @@ aarch64_struct_value_rtx (tree fndecl ATTRIBUTE_UNUSED,
static bool
aarch64_vector_mode_supported_p (machine_mode mode)
{
- if (TARGET_SIMD
- && (mode == V4SImode || mode == V8HImode
- || mode == V16QImode || mode == V2DImode
- || mode == V2SImode || mode == V4HImode
- || mode == V8QImode || mode == V2SFmode
- || mode == V4SFmode || mode == V2DFmode
- || mode == V4HFmode || mode == V8HFmode
- || mode == V1DFmode))
- return true;
-
- return false;
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ return vec_flags != 0 && (vec_flags & VEC_STRUCT) == 0;
}
/* Return appropriate SIMD container
for MODE within a vector of WIDTH bits. */
static machine_mode
-aarch64_simd_container_mode (scalar_mode mode, unsigned width)
+aarch64_simd_container_mode (scalar_mode mode, poly_int64 width)
{
- gcc_assert (width == 64 || width == 128);
+ if (TARGET_SVE && known_eq (width, BITS_PER_SVE_VECTOR))
+ switch (mode)
+ {
+ case E_DFmode:
+ return VNx2DFmode;
+ case E_SFmode:
+ return VNx4SFmode;
+ case E_HFmode:
+ return VNx8HFmode;
+ case E_DImode:
+ return VNx2DImode;
+ case E_SImode:
+ return VNx4SImode;
+ case E_HImode:
+ return VNx8HImode;
+ case E_QImode:
+ return VNx16QImode;
+ default:
+ return word_mode;
+ }
+
+ gcc_assert (known_eq (width, 64) || known_eq (width, 128));
if (TARGET_SIMD)
{
- if (width == 128)
+ if (known_eq (width, 128))
switch (mode)
{
case E_DFmode:
@@ -11437,7 +12602,8 @@ aarch64_simd_container_mode (scalar_mode mode, unsigned width)
static machine_mode
aarch64_preferred_simd_mode (scalar_mode mode)
{
- return aarch64_simd_container_mode (mode, 128);
+ poly_int64 bits = TARGET_SVE ? BITS_PER_SVE_VECTOR : 128;
+ return aarch64_simd_container_mode (mode, bits);
}
/* Return a list of possible vector sizes for the vectorizer
@@ -11445,6 +12611,8 @@ aarch64_preferred_simd_mode (scalar_mode mode)
static void
aarch64_autovectorize_vector_sizes (vector_sizes *sizes)
{
+ if (TARGET_SVE)
+ sizes->safe_push (BYTES_PER_SVE_VECTOR);
sizes->safe_push (16);
sizes->safe_push (8);
}
@@ -11606,6 +12774,125 @@ sizetochar (int size)
}
}
+/* Return true if BASE_OR_STEP is a valid immediate operand for an SVE INDEX
+ instruction. */
+
+bool
+aarch64_sve_index_immediate_p (rtx base_or_step)
+{
+ return (CONST_INT_P (base_or_step)
+ && IN_RANGE (INTVAL (base_or_step), -16, 15));
+}
+
+/* Return true if X is a valid immediate for the SVE ADD and SUB
+ instructions. Negate X first if NEGATE_P is true. */
+
+bool
+aarch64_sve_arith_immediate_p (rtx x, bool negate_p)
+{
+ rtx elt;
+
+ if (!const_vec_duplicate_p (x, &elt)
+ || !CONST_INT_P (elt))
+ return false;
+
+ HOST_WIDE_INT val = INTVAL (elt);
+ if (negate_p)
+ val = -val;
+ val &= GET_MODE_MASK (GET_MODE_INNER (GET_MODE (x)));
+
+ if (val & 0xff)
+ return IN_RANGE (val, 0, 0xff);
+ return IN_RANGE (val, 0, 0xff00);
+}
+
+/* Return true if X is a valid immediate operand for an SVE logical
+ instruction such as AND. */
+
+bool
+aarch64_sve_bitmask_immediate_p (rtx x)
+{
+ rtx elt;
+
+ return (const_vec_duplicate_p (x, &elt)
+ && CONST_INT_P (elt)
+ && aarch64_bitmask_imm (INTVAL (elt),
+ GET_MODE_INNER (GET_MODE (x))));
+}
+
+/* Return true if X is a valid immediate for the SVE DUP and CPY
+ instructions. */
+
+bool
+aarch64_sve_dup_immediate_p (rtx x)
+{
+ rtx elt;
+
+ if (!const_vec_duplicate_p (x, &elt)
+ || !CONST_INT_P (elt))
+ return false;
+
+ HOST_WIDE_INT val = INTVAL (elt);
+ if (val & 0xff)
+ return IN_RANGE (val, -0x80, 0x7f);
+ return IN_RANGE (val, -0x8000, 0x7f00);
+}
+
+/* Return true if X is a valid immediate operand for an SVE CMP instruction.
+ SIGNED_P says whether the operand is signed rather than unsigned. */
+
+bool
+aarch64_sve_cmp_immediate_p (rtx x, bool signed_p)
+{
+ rtx elt;
+
+ return (const_vec_duplicate_p (x, &elt)
+ && CONST_INT_P (elt)
+ && (signed_p
+ ? IN_RANGE (INTVAL (elt), -16, 15)
+ : IN_RANGE (INTVAL (elt), 0, 127)));
+}
+
+/* Return true if X is a valid immediate operand for an SVE FADD or FSUB
+ instruction. Negate X first if NEGATE_P is true. */
+
+bool
+aarch64_sve_float_arith_immediate_p (rtx x, bool negate_p)
+{
+ rtx elt;
+ REAL_VALUE_TYPE r;
+
+ if (!const_vec_duplicate_p (x, &elt)
+ || GET_CODE (elt) != CONST_DOUBLE)
+ return false;
+
+ r = *CONST_DOUBLE_REAL_VALUE (elt);
+
+ if (negate_p)
+ r = real_value_negate (&r);
+
+ if (real_equal (&r, &dconst1))
+ return true;
+ if (real_equal (&r, &dconsthalf))
+ return true;
+ return false;
+}
+
+/* Return true if X is a valid immediate operand for an SVE FMUL
+ instruction. */
+
+bool
+aarch64_sve_float_mul_immediate_p (rtx x)
+{
+ rtx elt;
+
+ /* GCC will never generate a multiply with an immediate of 2, so there is no
+ point testing for it (even though it is a valid constant). */
+ return (const_vec_duplicate_p (x, &elt)
+ && GET_CODE (elt) == CONST_DOUBLE
+ && real_equal (CONST_DOUBLE_REAL_VALUE (elt), &dconsthalf));
+}
+
/* Return true if replicating VAL32 is a valid 2-byte or 4-byte immediate
for the Advanced SIMD operation described by WHICH and INSN. If INFO
is nonnull, use it to describe valid immediates. */
@@ -11710,6 +12997,52 @@ aarch64_advsimd_valid_immediate (unsigned HOST_WIDE_INT val64,
return false;
}
+/* Return true if replicating VAL64 gives a valid immediate for an SVE MOV
+ instruction. If INFO is nonnull, use it to describe valid immediates. */
+
+static bool
+aarch64_sve_valid_immediate (unsigned HOST_WIDE_INT val64,
+ simd_immediate_info *info)
+{
+ scalar_int_mode mode = DImode;
+ unsigned int val32 = val64 & 0xffffffff;
+ if (val32 == (val64 >> 32))
+ {
+ mode = SImode;
+ unsigned int val16 = val32 & 0xffff;
+ if (val16 == (val32 >> 16))
+ {
+ mode = HImode;
+ unsigned int val8 = val16 & 0xff;
+ if (val8 == (val16 >> 8))
+ mode = QImode;
+ }
+ }
+ HOST_WIDE_INT val = trunc_int_for_mode (val64, mode);
+ if (IN_RANGE (val, -0x80, 0x7f))
+ {
+ /* DUP with no shift. */
+ if (info)
+ *info = simd_immediate_info (mode, val);
+ return true;
+ }
+ if ((val & 0xff) == 0 && IN_RANGE (val, -0x8000, 0x7f00))
+ {
+ /* DUP with LSL #8. */
+ if (info)
+ *info = simd_immediate_info (mode, val);
+ return true;
+ }
+ if (aarch64_bitmask_imm (val64, mode))
+ {
+ /* DUPM. */
+ if (info)
+ *info = simd_immediate_info (mode, val);
+ return true;
+ }
+ return false;
+}
+
/* Return true if OP is a valid SIMD immediate for the operation
described by WHICH. If INFO is nonnull, use it to describe valid
immediates. */
@@ -11717,18 +13050,39 @@ bool
aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info,
enum simd_immediate_check which)
{
- rtx elt = NULL;
+ machine_mode mode = GET_MODE (op);
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ if (vec_flags == 0 || vec_flags == (VEC_ADVSIMD | VEC_STRUCT))
+ return false;
+
+ scalar_mode elt_mode = GET_MODE_INNER (mode);
+ rtx elt = NULL, base, step;
unsigned int n_elts;
if (const_vec_duplicate_p (op, &elt))
n_elts = 1;
+ else if ((vec_flags & VEC_SVE_DATA)
+ && const_vec_series_p (op, &base, &step))
+ {
+ gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
+ if (!aarch64_sve_index_immediate_p (base)
+ || !aarch64_sve_index_immediate_p (step))
+ return false;
+
+ if (info)
+ *info = simd_immediate_info (elt_mode, base, step);
+ return true;
+ }
else if (GET_CODE (op) == CONST_VECTOR
&& CONST_VECTOR_NUNITS (op).is_constant (&n_elts))
/* N_ELTS set above. */;
else
return false;
- machine_mode mode = GET_MODE (op);
- scalar_mode elt_mode = GET_MODE_INNER (mode);
+ /* Handle PFALSE and PTRUE. */
+ if (vec_flags & VEC_SVE_PRED)
+ return (op == CONST0_RTX (mode)
+ || op == CONSTM1_RTX (mode));
+
scalar_float_mode elt_float_mode;
if (elt
&& is_a <scalar_float_mode> (elt_mode, &elt_float_mode)
@@ -11785,7 +13139,24 @@ aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info,
val64 |= ((unsigned HOST_WIDE_INT) bytes[i % nbytes]
<< (i * BITS_PER_UNIT));
- return aarch64_advsimd_valid_immediate (val64, info, which);
+ if (vec_flags & VEC_SVE_DATA)
+ return aarch64_sve_valid_immediate (val64, info);
+ else
+ return aarch64_advsimd_valid_immediate (val64, info, which);
+}
+
+/* Check whether X is a VEC_SERIES-like constant that starts at 0 and
+ has a step in the range of INDEX. Return the index expression if so,
+ otherwise return null. */
+rtx
+aarch64_check_zero_based_sve_index_immediate (rtx x)
+{
+ rtx base, step;
+ if (const_vec_series_p (x, &base, &step)
+ && base == const0_rtx
+ && aarch64_sve_index_immediate_p (step))
+ return step;
+ return NULL_RTX;
}
/* Check of immediate shift constants are within range. */
@@ -11799,16 +13170,6 @@ aarch64_simd_shift_imm_p (rtx x, machine_mode mode, bool left)
return aarch64_const_vec_all_same_in_range_p (x, 1, bit_width);
}
-/* Return true if X is a uniform vector where all elements
- are either the floating-point constant 0.0 or the
- integer constant 0. */
-bool
-aarch64_simd_imm_zero_p (rtx x, machine_mode mode)
-{
- return x == CONST0_RTX (mode);
-}
-
-
/* Return the bitmask CONST_INT to select the bits required by a zero extract
operation of width WIDTH at bit position POS. */
@@ -11833,9 +13194,15 @@ aarch64_mov_operand_p (rtx x, machine_mode mode)
if (CONST_INT_P (x))
return true;
+ if (VECTOR_MODE_P (GET_MODE (x)))
+ return aarch64_simd_valid_immediate (x, NULL);
+
if (GET_CODE (x) == SYMBOL_REF && mode == DImode && CONSTANT_ADDRESS_P (x))
return true;
+ if (aarch64_sve_cnt_immediate_p (x))
+ return true;
+
return aarch64_classify_symbolic_expression (x)
== SYMBOL_TINY_ABSOLUTE;
}
@@ -11855,7 +13222,7 @@ aarch64_simd_scalar_immediate_valid_for_move (rtx op, scalar_int_mode mode)
{
machine_mode vmode;
- vmode = aarch64_preferred_simd_mode (mode);
+ vmode = aarch64_simd_container_mode (mode, 64);
rtx op_v = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (op));
return aarch64_simd_valid_immediate (op_v, NULL);
}
@@ -11965,6 +13332,7 @@ aarch64_endian_lane_rtx (machine_mode mode, unsigned int n)
}
/* Return TRUE if OP is a valid vector addressing mode. */
+
bool
aarch64_simd_mem_operand_p (rtx op)
{
@@ -11972,6 +13340,34 @@ aarch64_simd_mem_operand_p (rtx op)
|| REG_P (XEXP (op, 0)));
}
+/* Return true if OP is a valid MEM operand for an SVE LD1R instruction. */
+
+bool
+aarch64_sve_ld1r_operand_p (rtx op)
+{
+ struct aarch64_address_info addr;
+ scalar_mode mode;
+
+ return (MEM_P (op)
+ && is_a <scalar_mode> (GET_MODE (op), &mode)
+ && aarch64_classify_address (&addr, XEXP (op, 0), mode, false)
+ && addr.type == ADDRESS_REG_IMM
+ && offset_6bit_unsigned_scaled_p (mode, addr.const_offset));
+}
+
+/* Return true if OP is a valid MEM operand for an SVE LDR instruction.
+ The conditions for STR are the same. */
+bool
+aarch64_sve_ldr_operand_p (rtx op)
+{
+ struct aarch64_address_info addr;
+
+ return (MEM_P (op)
+ && aarch64_classify_address (&addr, XEXP (op, 0), GET_MODE (op),
+ false, ADDR_QUERY_ANY)
+ && addr.type == ADDRESS_REG_IMM);
+}
+
/* Emit a register copy from operand to operand, taking care not to
early-clobber source registers in the process.
@@ -12006,14 +13402,36 @@ aarch64_simd_attr_length_rglist (machine_mode mode)
}
/* Implement target hook TARGET_VECTOR_ALIGNMENT. The AAPCS64 sets the maximum
- alignment of a vector to 128 bits. */
+ alignment of a vector to 128 bits. SVE predicates have an alignment of
+ 16 bits. */
static HOST_WIDE_INT
aarch64_simd_vector_alignment (const_tree type)
{
+ if (TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST)
+ /* ??? Checking the mode isn't ideal, but VECTOR_BOOLEAN_TYPE_P can
+ be set for non-predicate vectors of booleans. Modes are the most
+ direct way we have of identifying real SVE predicate types. */
+ return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
return MIN (align, 128);
}
+/* Implement target hook TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT. */
+static HOST_WIDE_INT
+aarch64_vectorize_preferred_vector_alignment (const_tree type)
+{
+ if (aarch64_sve_data_mode_p (TYPE_MODE (type)))
+ {
+ /* If the length of the vector is fixed, try to align to that length,
+ otherwise don't try to align at all. */
+ HOST_WIDE_INT result;
+ if (!BITS_PER_SVE_VECTOR.is_constant (&result))
+ result = TYPE_ALIGN (TREE_TYPE (type));
+ return result;
+ }
+ return TYPE_ALIGN (type);
+}
+
/* Implement target hook TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE. */
static bool
aarch64_simd_vector_alignment_reachable (const_tree type, bool is_packed)
@@ -12021,9 +13439,12 @@ aarch64_simd_vector_alignment_reachable (const_tree type, bool is_packed)
if (is_packed)
return false;
- /* We guarantee alignment for vectors up to 128-bits. */
- if (tree_int_cst_compare (TYPE_SIZE (type),
- bitsize_int (BIGGEST_ALIGNMENT)) > 0)
+ /* For fixed-length vectors, check that the vectorizer will aim for
+ full-vector alignment. This isn't true for generic GCC vectors
+ that are wider than the ABI maximum of 128 bits. */
+ if (TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
+ && (wi::to_widest (TYPE_SIZE (type))
+ != aarch64_vectorize_preferred_vector_alignment (type)))
return false;
/* Vectors whose size is <= BIGGEST_ALIGNMENT are naturally aligned. */
@@ -12268,12 +13689,9 @@ aarch64_expand_vector_init (rtx target, rtx vals)
static unsigned HOST_WIDE_INT
aarch64_shift_truncation_mask (machine_mode mode)
{
- return
- (!SHIFT_COUNT_TRUNCATED
- || aarch64_vector_mode_supported_p (mode)
- || aarch64_vect_struct_mode_p (mode))
- ? 0
- : (GET_MODE_UNIT_BITSIZE (mode) - 1);
+ if (!SHIFT_COUNT_TRUNCATED || aarch64_vector_data_mode_p (mode))
+ return 0;
+ return GET_MODE_UNIT_BITSIZE (mode) - 1;
}
/* Select a format to encode pointers in exception handling data. */
@@ -13250,6 +14668,67 @@ aarch64_output_scalar_simd_mov_immediate (rtx immediate, scalar_int_mode mode)
return aarch64_output_simd_mov_immediate (v_op, width);
}
+/* Return the output string to use for moving immediate CONST_VECTOR
+ into an SVE register. */
+
+char *
+aarch64_output_sve_mov_immediate (rtx const_vector)
+{
+ static char templ[40];
+ struct simd_immediate_info info;
+ char element_char;
+
+ bool is_valid = aarch64_simd_valid_immediate (const_vector, &info);
+ gcc_assert (is_valid);
+
+ element_char = sizetochar (GET_MODE_BITSIZE (info.elt_mode));
+
+ if (info.step)
+ {
+ snprintf (templ, sizeof (templ), "index\t%%0.%c, #"
+ HOST_WIDE_INT_PRINT_DEC ", #" HOST_WIDE_INT_PRINT_DEC,
+ element_char, INTVAL (info.value), INTVAL (info.step));
+ return templ;
+ }
+
+ if (GET_MODE_CLASS (info.elt_mode) == MODE_FLOAT)
+ {
+ if (aarch64_float_const_zero_rtx_p (info.value))
+ info.value = GEN_INT (0);
+ else
+ {
+ const int buf_size = 20;
+ char float_buf[buf_size] = {};
+ real_to_decimal_for_mode (float_buf,
+ CONST_DOUBLE_REAL_VALUE (info.value),
+ buf_size, buf_size, 1, info.elt_mode);
+
+ snprintf (templ, sizeof (templ), "fmov\t%%0.%c, #%s",
+ element_char, float_buf);
+ return templ;
+ }
+ }
+
+ snprintf (templ, sizeof (templ), "mov\t%%0.%c, #" HOST_WIDE_INT_PRINT_DEC,
+ element_char, INTVAL (info.value));
+ return templ;
+}
+
+/* Return the asm format for a PTRUE instruction whose destination has
+ mode MODE. SUFFIX is the element size suffix. */
+
+char *
+aarch64_output_ptrue (machine_mode mode, char suffix)
+{
+ unsigned int nunits;
+ static char buf[sizeof ("ptrue\t%0.N, vlNNNNN")];
+ if (GET_MODE_NUNITS (mode).is_constant (&nunits))
+ snprintf (buf, sizeof (buf), "ptrue\t%%0.%c, vl%d", suffix, nunits);
+ else
+ snprintf (buf, sizeof (buf), "ptrue\t%%0.%c, all", suffix);
+ return buf;
+}
+
/* Split operands into moves from op[1] + op[2] into op[0]. */
void
@@ -13304,13 +14783,12 @@ aarch64_split_combinev16qi (rtx operands[3])
/* vec_perm support. */
-#define MAX_VECT_LEN 16
-
struct expand_vec_perm_d
{
rtx target, op0, op1;
vec_perm_indices perm;
machine_mode vmode;
+ unsigned int vec_flags;
bool one_vector_p;
bool testing_p;
};
@@ -13392,6 +14870,74 @@ aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel,
aarch64_expand_vec_perm_1 (target, op0, op1, sel);
}
+/* Generate (set TARGET (unspec [OP0 OP1] CODE)). */
+
+static void
+emit_unspec2 (rtx target, int code, rtx op0, rtx op1)
+{
+ emit_insn (gen_rtx_SET (target,
+ gen_rtx_UNSPEC (GET_MODE (target),
+ gen_rtvec (2, op0, op1), code)));
+}
+
+/* Expand an SVE vec_perm with the given operands. */
+
+void
+aarch64_expand_sve_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
+{
+ machine_mode data_mode = GET_MODE (target);
+ machine_mode sel_mode = GET_MODE (sel);
+ /* Enforced by the pattern condition. */
+ int nunits = GET_MODE_NUNITS (sel_mode).to_constant ();
+
+ /* Note: vec_perm indices are supposed to wrap when they go beyond the
+ size of the two value vectors, i.e. the upper bits of the indices
+ are effectively ignored. SVE TBL instead produces 0 for any
+ out-of-range indices, so we need to modulo all the vec_perm indices
+ to ensure they are all in range. */
+ rtx sel_reg = force_reg (sel_mode, sel);
+
+ /* Check if the sel only references the first values vector. */
+ if (GET_CODE (sel) == CONST_VECTOR
+ && aarch64_const_vec_all_in_range_p (sel, 0, nunits - 1))
+ {
+ emit_unspec2 (target, UNSPEC_TBL, op0, sel_reg);
+ return;
+ }
+
+ /* Check if the two values vectors are the same. */
+ if (rtx_equal_p (op0, op1))
+ {
+ rtx max_sel = aarch64_simd_gen_const_vector_dup (sel_mode, nunits - 1);
+ rtx sel_mod = expand_simple_binop (sel_mode, AND, sel_reg, max_sel,
+ NULL, 0, OPTAB_DIRECT);
+ emit_unspec2 (target, UNSPEC_TBL, op0, sel_mod);
+ return;
+ }
+
+ /* Run TBL on for each value vector and combine the results. */
+
+ rtx res0 = gen_reg_rtx (data_mode);
+ rtx res1 = gen_reg_rtx (data_mode);
+ rtx neg_num_elems = aarch64_simd_gen_const_vector_dup (sel_mode, -nunits);
+ if (GET_CODE (sel) != CONST_VECTOR
+ || !aarch64_const_vec_all_in_range_p (sel, 0, 2 * nunits - 1))
+ {
+ rtx max_sel = aarch64_simd_gen_const_vector_dup (sel_mode,
+ 2 * nunits - 1);
+ sel_reg = expand_simple_binop (sel_mode, AND, sel_reg, max_sel,
+ NULL, 0, OPTAB_DIRECT);
+ }
+ emit_unspec2 (res0, UNSPEC_TBL, op0, sel_reg);
+ rtx sel_sub = expand_simple_binop (sel_mode, PLUS, sel_reg, neg_num_elems,
+ NULL, 0, OPTAB_DIRECT);
+ emit_unspec2 (res1, UNSPEC_TBL, op1, sel_sub);
+ if (GET_MODE_CLASS (data_mode) == MODE_VECTOR_INT)
+ emit_insn (gen_rtx_SET (target, gen_rtx_IOR (data_mode, res0, res1)));
+ else
+ emit_unspec2 (target, UNSPEC_IORF, res0, res1);
+}
+
/* Recognize patterns suitable for the TRN instructions. */
static bool
aarch64_evpc_trn (struct expand_vec_perm_d *d)
@@ -13418,7 +14964,9 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
in0 = d->op0;
in1 = d->op1;
- if (BYTES_BIG_ENDIAN)
+ /* We don't need a big-endian lane correction for SVE; see the comment
+ at the head of aarch64-sve.md for details. */
+ if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
{
x = in0, in0 = in1, in1 = x;
odd = !odd;
@@ -13454,7 +15002,9 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d)
in0 = d->op0;
in1 = d->op1;
- if (BYTES_BIG_ENDIAN)
+ /* We don't need a big-endian lane correction for SVE; see the comment
+ at the head of aarch64-sve.md for details. */
+ if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
{
x = in0, in0 = in1, in1 = x;
odd = !odd;
@@ -13493,7 +15043,9 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
in0 = d->op0;
in1 = d->op1;
- if (BYTES_BIG_ENDIAN)
+ /* We don't need a big-endian lane correction for SVE; see the comment
+ at the head of aarch64-sve.md for details. */
+ if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
{
x = in0, in0 = in1, in1 = x;
high = !high;
@@ -13515,7 +15067,8 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d)
/* The first element always refers to the first vector.
Check if the extracted indices are increasing by one. */
- if (!d->perm[0].is_constant (&location)
+ if (d->vec_flags == VEC_SVE_PRED
+ || !d->perm[0].is_constant (&location)
|| !d->perm.series_p (0, 1, location, 1))
return false;
@@ -13524,9 +15077,11 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d)
return true;
/* The case where (location == 0) is a no-op for both big- and little-endian,
- and is removed by the mid-end at optimization levels -O1 and higher. */
+ and is removed by the mid-end at optimization levels -O1 and higher.
- if (BYTES_BIG_ENDIAN && (location != 0))
+ We don't need a big-endian lane correction for SVE; see the comment
+ at the head of aarch64-sve.md for details. */
+ if (BYTES_BIG_ENDIAN && location != 0 && d->vec_flags == VEC_ADVSIMD)
{
/* After setup, we want the high elements of the first vector (stored
at the LSB end of the register), and the low elements of the second
@@ -13546,25 +15101,37 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d)
return true;
}
-/* Recognize patterns for the REV insns. */
+/* Recognize patterns for the REV{64,32,16} insns, which reverse elements
+ within each 64-bit, 32-bit or 16-bit granule. */
static bool
-aarch64_evpc_rev (struct expand_vec_perm_d *d)
+aarch64_evpc_rev_local (struct expand_vec_perm_d *d)
{
HOST_WIDE_INT diff;
unsigned int i, size, unspec;
+ machine_mode pred_mode;
- if (!d->one_vector_p
+ if (d->vec_flags == VEC_SVE_PRED
+ || !d->one_vector_p
|| !d->perm[0].is_constant (&diff))
return false;
size = (diff + 1) * GET_MODE_UNIT_SIZE (d->vmode);
if (size == 8)
- unspec = UNSPEC_REV64;
+ {
+ unspec = UNSPEC_REV64;
+ pred_mode = VNx2BImode;
+ }
else if (size == 4)
- unspec = UNSPEC_REV32;
+ {
+ unspec = UNSPEC_REV32;
+ pred_mode = VNx4BImode;
+ }
else if (size == 2)
- unspec = UNSPEC_REV16;
+ {
+ unspec = UNSPEC_REV16;
+ pred_mode = VNx8BImode;
+ }
else
return false;
@@ -13577,8 +15144,37 @@ aarch64_evpc_rev (struct expand_vec_perm_d *d)
if (d->testing_p)
return true;
- emit_set_insn (d->target, gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0),
- unspec));
+ rtx src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), unspec);
+ if (d->vec_flags == VEC_SVE_DATA)
+ {
+ rtx pred = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+ src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (2, pred, src),
+ UNSPEC_MERGE_PTRUE);
+ }
+ emit_set_insn (d->target, src);
+ return true;
+}
+
+/* Recognize patterns for the REV insn, which reverses elements within
+ a full vector. */
+
+static bool
+aarch64_evpc_rev_global (struct expand_vec_perm_d *d)
+{
+ poly_uint64 nelt = d->perm.length ();
+
+ if (!d->one_vector_p || d->vec_flags != VEC_SVE_DATA)
+ return false;
+
+ if (!d->perm.series_p (0, 1, nelt - 1, -1))
+ return false;
+
+ /* Success! */
+ if (d->testing_p)
+ return true;
+
+ rtx src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), UNSPEC_REV);
+ emit_set_insn (d->target, src);
return true;
}
@@ -13591,10 +15187,14 @@ aarch64_evpc_dup (struct expand_vec_perm_d *d)
machine_mode vmode = d->vmode;
rtx lane;
- if (d->perm.encoding ().encoded_nelts () != 1
+ if (d->vec_flags == VEC_SVE_PRED
+ || d->perm.encoding ().encoded_nelts () != 1
|| !d->perm[0].is_constant (&elt))
return false;
+ if (d->vec_flags == VEC_SVE_DATA && elt >= 64 * GET_MODE_UNIT_SIZE (vmode))
+ return false;
+
/* Success! */
if (d->testing_p)
return true;
@@ -13616,7 +15216,7 @@ aarch64_evpc_dup (struct expand_vec_perm_d *d)
static bool
aarch64_evpc_tbl (struct expand_vec_perm_d *d)
{
- rtx rperm[MAX_VECT_LEN], sel;
+ rtx rperm[MAX_COMPILE_TIME_VEC_BYTES], sel;
machine_mode vmode = d->vmode;
/* Make sure that the indices are constant. */
@@ -13652,6 +15252,27 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d)
return true;
}
+/* Try to implement D using an SVE TBL instruction. */
+
+static bool
+aarch64_evpc_sve_tbl (struct expand_vec_perm_d *d)
+{
+ unsigned HOST_WIDE_INT nelt;
+
+ /* Permuting two variable-length vectors could overflow the
+ index range. */
+ if (!d->one_vector_p && !d->perm.length ().is_constant (&nelt))
+ return false;
+
+ if (d->testing_p)
+ return true;
+
+ machine_mode sel_mode = mode_for_int_vector (d->vmode).require ();
+ rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
+ aarch64_expand_sve_vec_perm (d->target, d->op0, d->op1, sel);
+ return true;
+}
+
static bool
aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
{
@@ -13665,9 +15286,14 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
std::swap (d->op0, d->op1);
}
- if (TARGET_SIMD && known_gt (nelt, 1))
+ if ((d->vec_flags == VEC_ADVSIMD
+ || d->vec_flags == VEC_SVE_DATA
+ || d->vec_flags == VEC_SVE_PRED)
+ && known_gt (nelt, 1))
{
- if (aarch64_evpc_rev (d))
+ if (aarch64_evpc_rev_local (d))
+ return true;
+ else if (aarch64_evpc_rev_global (d))
return true;
else if (aarch64_evpc_ext (d))
return true;
@@ -13679,7 +15305,10 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
return true;
else if (aarch64_evpc_trn (d))
return true;
- return aarch64_evpc_tbl (d);
+ if (d->vec_flags == VEC_SVE_DATA)
+ return aarch64_evpc_sve_tbl (d);
+ else if (d->vec_flags == VEC_SVE_DATA)
+ return aarch64_evpc_tbl (d);
}
return false;
}
@@ -13711,6 +15340,7 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
sel.nelts_per_input ());
d.vmode = vmode;
+ d.vec_flags = aarch64_classify_vector_mode (d.vmode);
d.target = target;
d.op0 = op0;
d.op1 = op1;
@@ -13749,6 +15379,272 @@ aarch64_reverse_mask (machine_mode mode, unsigned int nunits)
return force_reg (V16QImode, mask);
}
+/* Return true if X is a valid second operand for the SVE instruction
+ that implements integer comparison OP_CODE. */
+
+static bool
+aarch64_sve_cmp_operand_p (rtx_code op_code, rtx x)
+{
+ if (register_operand (x, VOIDmode))
+ return true;
+
+ switch (op_code)
+ {
+ case LTU:
+ case LEU:
+ case GEU:
+ case GTU:
+ return aarch64_sve_cmp_immediate_p (x, false);
+ case LT:
+ case LE:
+ case GE:
+ case GT:
+ case NE:
+ case EQ:
+ return aarch64_sve_cmp_immediate_p (x, true);
+ default:
+ gcc_unreachable ();
+ }
+}
+
+/* Return the UNSPEC_COND_* code for comparison CODE. */
+
+static unsigned int
+aarch64_unspec_cond_code (rtx_code code)
+{
+ switch (code)
+ {
+ case NE:
+ return UNSPEC_COND_NE;
+ case EQ:
+ return UNSPEC_COND_EQ;
+ case LT:
+ return UNSPEC_COND_LT;
+ case GT:
+ return UNSPEC_COND_GT;
+ case LE:
+ return UNSPEC_COND_LE;
+ case GE:
+ return UNSPEC_COND_GE;
+ case LTU:
+ return UNSPEC_COND_LO;
+ case GTU:
+ return UNSPEC_COND_HI;
+ case LEU:
+ return UNSPEC_COND_LS;
+ case GEU:
+ return UNSPEC_COND_HS;
+ case UNORDERED:
+ return UNSPEC_COND_UO;
+ default:
+ gcc_unreachable ();
+ }
+}
+
+/* Return an (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>) expression,
+ where <X> is the operation associated with comparison CODE. */
+
+static rtx
+aarch64_gen_unspec_cond (rtx_code code, machine_mode pred_mode,
+ rtx pred, rtx op0, rtx op1)
+{
+ rtvec vec = gen_rtvec (3, pred, op0, op1);
+ return gen_rtx_UNSPEC (pred_mode, vec, aarch64_unspec_cond_code (code));
+}
+
+/* Expand an SVE integer comparison:
+
+ TARGET = CODE (OP0, OP1). */
+
+void
+aarch64_expand_sve_vec_cmp_int (rtx target, rtx_code code, rtx op0, rtx op1)
+{
+ machine_mode pred_mode = GET_MODE (target);
+ machine_mode data_mode = GET_MODE (op0);
+
+ if (!aarch64_sve_cmp_operand_p (code, op1))
+ op1 = force_reg (data_mode, op1);
+
+ rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+ rtx unspec = aarch64_gen_unspec_cond (code, pred_mode, ptrue, op0, op1);
+ emit_insn (gen_set_clobber_cc (target, unspec));
+}
+
+/* Emit an instruction:
+
+ (set TARGET (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>))
+
+ where <X> is the operation associated with comparison CODE. */
+
+static void
+aarch64_emit_unspec_cond (rtx target, rtx_code code, machine_mode pred_mode,
+ rtx pred, rtx op0, rtx op1)
+{
+ rtx unspec = aarch64_gen_unspec_cond (code, pred_mode, pred, op0, op1);
+ emit_set_insn (target, unspec);
+}
+
+/* Emit:
+
+ (set TMP1 (unspec:PRED_MODE [PTRUE OP0 OP1] UNSPEC_COND_<X1>))
+ (set TMP2 (unspec:PRED_MODE [PTRUE OP0 OP1] UNSPEC_COND_<X2>))
+ (set TARGET (and:PRED_MODE (ior:PRED_MODE TMP1 TMP2) PTRUE))
+
+ where <Xi> is the operation associated with comparison CODEi. */
+
+static void
+aarch64_emit_unspec_cond_or (rtx target, rtx_code code1, rtx_code code2,
+ machine_mode pred_mode, rtx ptrue,
+ rtx op0, rtx op1)
+{
+ rtx tmp1 = gen_reg_rtx (pred_mode);
+ aarch64_emit_unspec_cond (tmp1, code1, pred_mode, ptrue, op0, op1);
+ rtx tmp2 = gen_reg_rtx (pred_mode);
+ aarch64_emit_unspec_cond (tmp2, code2, pred_mode, ptrue, op0, op1);
+ emit_set_insn (target, gen_rtx_AND (pred_mode,
+ gen_rtx_IOR (pred_mode, tmp1, tmp2),
+ ptrue));
+}
+
+/* If CAN_INVERT_P, emit an instruction:
+
+ (set TARGET (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>))
+
+ where <X> is the operation associated with comparison CODE. Otherwise
+ emit:
+
+ (set TMP (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>))
+ (set TARGET (and:PRED_MODE (not:PRED_MODE TMP) PTRUE))
+
+ where the second instructions sets TARGET to the inverse of TMP. */
+
+static void
+aarch64_emit_inverted_unspec_cond (rtx target, rtx_code code,
+ machine_mode pred_mode, rtx ptrue, rtx pred,
+ rtx op0, rtx op1, bool can_invert_p)
+{
+ if (can_invert_p)
+ aarch64_emit_unspec_cond (target, code, pred_mode, pred, op0, op1);
+ else
+ {
+ rtx tmp = gen_reg_rtx (pred_mode);
+ aarch64_emit_unspec_cond (tmp, code, pred_mode, pred, op0, op1);
+ emit_set_insn (target, gen_rtx_AND (pred_mode,
+ gen_rtx_NOT (pred_mode, tmp),
+ ptrue));
+ }
+}
+
+/* Expand an SVE floating-point comparison:
+
+ TARGET = CODE (OP0, OP1)
+
+ If CAN_INVERT_P is true, the caller can also handle inverted results;
+ return true if the result is in fact inverted. */
+
+bool
+aarch64_expand_sve_vec_cmp_float (rtx target, rtx_code code,
+ rtx op0, rtx op1, bool can_invert_p)
+{
+ machine_mode pred_mode = GET_MODE (target);
+ machine_mode data_mode = GET_MODE (op0);
+
+ rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+ switch (code)
+ {
+ case UNORDERED:
+ /* UNORDERED has no immediate form. */
+ op1 = force_reg (data_mode, op1);
+ aarch64_emit_unspec_cond (target, code, pred_mode, ptrue, op0, op1);
+ return false;
+
+ case LT:
+ case LE:
+ case GT:
+ case GE:
+ case EQ:
+ case NE:
+ /* There is native support for the comparison. */
+ aarch64_emit_unspec_cond (target, code, pred_mode, ptrue, op0, op1);
+ return false;
+
+ case ORDERED:
+ /* There is native support for the inverse comparison. */
+ op1 = force_reg (data_mode, op1);
+ aarch64_emit_inverted_unspec_cond (target, UNORDERED,
+ pred_mode, ptrue, ptrue, op0, op1,
+ can_invert_p);
+ return can_invert_p;
+
+ case LTGT:
+ /* This is a trapping operation (LT or GT). */
+ aarch64_emit_unspec_cond_or (target, LT, GT, pred_mode, ptrue, op0, op1);
+ return false;
+
+ case UNEQ:
+ if (!flag_trapping_math)
+ {
+ /* This would trap for signaling NaNs. */
+ op1 = force_reg (data_mode, op1);
+ aarch64_emit_unspec_cond_or (target, UNORDERED, EQ,
+ pred_mode, ptrue, op0, op1);
+ return false;
+ }
+ /* fall through */
+
+ case UNLT:
+ case UNLE:
+ case UNGT:
+ case UNGE:
+ {
+ rtx ordered = ptrue;
+ if (flag_trapping_math)
+ {
+ /* Only compare the elements that are known to be ordered. */
+ ordered = gen_reg_rtx (pred_mode);
+ op1 = force_reg (data_mode, op1);
+ aarch64_emit_inverted_unspec_cond (ordered, UNORDERED, pred_mode,
+ ptrue, ptrue, op0, op1, false);
+ }
+ if (code == UNEQ)
+ code = NE;
+ else
+ code = reverse_condition_maybe_unordered (code);
+ aarch64_emit_inverted_unspec_cond (target, code, pred_mode, ptrue,
+ ordered, op0, op1, can_invert_p);
+ return can_invert_p;
+ }
+
+ default:
+ gcc_unreachable ();
+ }
+}
+
+/* Expand an SVE vcond pattern with operands OPS. DATA_MODE is the mode
+ of the data being selected and CMP_MODE is the mode of the values being
+ compared. */
+
+void
+aarch64_expand_sve_vcond (machine_mode data_mode, machine_mode cmp_mode,
+ rtx *ops)
+{
+ machine_mode pred_mode
+ = aarch64_get_mask_mode (GET_MODE_NUNITS (cmp_mode),
+ GET_MODE_SIZE (cmp_mode)).require ();
+ rtx pred = gen_reg_rtx (pred_mode);
+ if (FLOAT_MODE_P (cmp_mode))
+ {
+ if (aarch64_expand_sve_vec_cmp_float (pred, GET_CODE (ops[3]),
+ ops[4], ops[5], true))
+ std::swap (ops[1], ops[2]);
+ }
+ else
+ aarch64_expand_sve_vec_cmp_int (pred, GET_CODE (ops[3]), ops[4], ops[5]);
+
+ rtvec vec = gen_rtvec (3, pred, ops[1], ops[2]);
+ emit_set_insn (ops[0], gen_rtx_UNSPEC (data_mode, vec, UNSPEC_SEL));
+}
+
/* Implement TARGET_MODES_TIEABLE_P. In principle we should always return
true. However due to issues with register allocation it is preferable
to avoid tieing integer scalar and FP scalar modes. Executing integer
@@ -13765,8 +15661,12 @@ aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2)
/* We specifically want to allow elements of "structure" modes to
be tieable to the structure. This more general condition allows
- other rarer situations too. */
- if (aarch64_vector_mode_p (mode1) && aarch64_vector_mode_p (mode2))
+ other rarer situations too. The reason we don't extend this to
+ predicate modes is that there are no predicate structure modes
+ nor any specific instructions for extracting part of a predicate
+ register. */
+ if (aarch64_vector_data_mode_p (mode1)
+ && aarch64_vector_data_mode_p (mode2))
return true;
/* Also allow any scalar modes with vectors. */
@@ -15020,6 +16920,19 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
}
}
+/* Implement the TARGET_DWARF_POLY_INDETERMINATE_VALUE hook. */
+
+static unsigned int
+aarch64_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
+ int *offset)
+{
+ /* Polynomial invariant 1 == (VG / 2) - 1. */
+ gcc_assert (i == 1);
+ *factor = 2;
+ *offset = 1;
+ return AARCH64_DWARF_VG;
+}
+
/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
if MODE is HFmode, and punt to the generic implementation otherwise. */
@@ -15112,6 +17025,38 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn)
}
}
+/* Implement TARGET_COMPUTE_PRESSURE_CLASSES. */
+
+static int
+aarch64_compute_pressure_classes (reg_class *classes)
+{
+ int i = 0;
+ classes[i++] = GENERAL_REGS;
+ classes[i++] = FP_REGS;
+ /* PR_REGS isn't a useful pressure class because many predicate pseudo
+ registers need to go in PR_LO_REGS at some point during their
+ lifetime. Splitting it into two halves has the effect of making
+ all predicates count against PR_LO_REGS, so that we try whenever
+ possible to restrict the number of live predicates to 8. This
+ greatly reduces the amount of spilling in certain loops. */
+ classes[i++] = PR_LO_REGS;
+ classes[i++] = PR_HI_REGS;
+ return i;
+}
+
+/* Implement TARGET_CAN_CHANGE_MODE_CLASS. */
+
+static bool
+aarch64_can_change_mode_class (machine_mode from,
+ machine_mode to, reg_class_t)
+{
+ /* See the comment at the head of aarch64-sve.md for details. */
+ if (BYTES_BIG_ENDIAN
+ && (aarch64_sve_data_mode_p (from) != aarch64_sve_data_mode_p (to)))
+ return false;
+ return true;
+}
+
/* Target-specific selftests. */
#if CHECKING_P
@@ -15260,6 +17205,11 @@ aarch64_run_selftests (void)
#undef TARGET_FUNCTION_ARG_PADDING
#define TARGET_FUNCTION_ARG_PADDING aarch64_function_arg_padding
+#undef TARGET_GET_RAW_RESULT_MODE
+#define TARGET_GET_RAW_RESULT_MODE aarch64_get_reg_raw_mode
+#undef TARGET_GET_RAW_ARG_MODE
+#define TARGET_GET_RAW_ARG_MODE aarch64_get_reg_raw_mode
+
#undef TARGET_FUNCTION_OK_FOR_SIBCALL
#define TARGET_FUNCTION_OK_FOR_SIBCALL aarch64_function_ok_for_sibcall
@@ -15468,6 +17418,9 @@ aarch64_libgcc_floating_mode_supported_p
#undef TARGET_VECTOR_ALIGNMENT
#define TARGET_VECTOR_ALIGNMENT aarch64_simd_vector_alignment
+#undef TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT
+#define TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT \
+ aarch64_vectorize_preferred_vector_alignment
#undef TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
#define TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE \
aarch64_simd_vector_alignment_reachable
@@ -15478,6 +17431,9 @@ aarch64_libgcc_floating_mode_supported_p
#define TARGET_VECTORIZE_VEC_PERM_CONST \
aarch64_vectorize_vec_perm_const
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE aarch64_get_mask_mode
+
#undef TARGET_INIT_LIBFUNCS
#define TARGET_INIT_LIBFUNCS aarch64_init_libfuncs
@@ -15532,6 +17488,10 @@ aarch64_libgcc_floating_mode_supported_p
#undef TARGET_OMIT_STRUCT_RETURN_REG
#define TARGET_OMIT_STRUCT_RETURN_REG true
+#undef TARGET_DWARF_POLY_INDETERMINATE_VALUE
+#define TARGET_DWARF_POLY_INDETERMINATE_VALUE \
+ aarch64_dwarf_poly_indeterminate_value
+
/* The architecture reserves bits 0 and 1 so use bit 2 for descriptors. */
#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 4
@@ -15551,6 +17511,12 @@ aarch64_libgcc_floating_mode_supported_p
#undef TARGET_CONSTANT_ALIGNMENT
#define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
+#undef TARGET_COMPUTE_PRESSURE_CLASSES
+#define TARGET_COMPUTE_PRESSURE_CLASSES aarch64_compute_pressure_classes
+
+#undef TARGET_CAN_CHANGE_MODE_CLASS
+#define TARGET_CAN_CHANGE_MODE_CLASS aarch64_can_change_mode_class
+
#if CHECKING_P
#undef TARGET_RUN_TARGET_SELFTESTS
#define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 98e4517..fc99fc4 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -144,18 +144,19 @@ extern unsigned aarch64_architecture_version;
/* ARMv8.2-A architecture extensions. */
#define AARCH64_FL_V8_2 (1 << 8) /* Has ARMv8.2-A features. */
#define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2-A FP16 extensions. */
+#define AARCH64_FL_SVE (1 << 10) /* Has Scalable Vector Extensions. */
/* ARMv8.3-A architecture extensions. */
-#define AARCH64_FL_V8_3 (1 << 10) /* Has ARMv8.3-A features. */
-#define AARCH64_FL_RCPC (1 << 11) /* Has support for RCpc model. */
-#define AARCH64_FL_DOTPROD (1 << 12) /* Has ARMv8.2-A Dot Product ins. */
+#define AARCH64_FL_V8_3 (1 << 11) /* Has ARMv8.3-A features. */
+#define AARCH64_FL_RCPC (1 << 12) /* Has support for RCpc model. */
+#define AARCH64_FL_DOTPROD (1 << 13) /* Has ARMv8.2-A Dot Product ins. */
/* New flags to split crypto into aes and sha2. */
-#define AARCH64_FL_AES (1 << 13) /* Has Crypto AES. */
-#define AARCH64_FL_SHA2 (1 << 14) /* Has Crypto SHA2. */
+#define AARCH64_FL_AES (1 << 14) /* Has Crypto AES. */
+#define AARCH64_FL_SHA2 (1 << 15) /* Has Crypto SHA2. */
/* ARMv8.4-A architecture extensions. */
-#define AARCH64_FL_V8_4 (1 << 15) /* Has ARMv8.4-A features. */
-#define AARCH64_FL_SM4 (1 << 16) /* Has ARMv8.4-A SM3 and SM4. */
-#define AARCH64_FL_SHA3 (1 << 17) /* Has ARMv8.4-a SHA3 and SHA512. */
-#define AARCH64_FL_F16FML (1 << 18) /* Has ARMv8.4-a FP16 extensions. */
+#define AARCH64_FL_V8_4 (1 << 16) /* Has ARMv8.4-A features. */
+#define AARCH64_FL_SM4 (1 << 17) /* Has ARMv8.4-A SM3 and SM4. */
+#define AARCH64_FL_SHA3 (1 << 18) /* Has ARMv8.4-a SHA3 and SHA512. */
+#define AARCH64_FL_F16FML (1 << 19) /* Has ARMv8.4-a FP16 extensions. */
/* Has FP and SIMD. */
#define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -186,6 +187,7 @@ extern unsigned aarch64_architecture_version;
#define AARCH64_ISA_RDMA (aarch64_isa_flags & AARCH64_FL_RDMA)
#define AARCH64_ISA_V8_2 (aarch64_isa_flags & AARCH64_FL_V8_2)
#define AARCH64_ISA_F16 (aarch64_isa_flags & AARCH64_FL_F16)
+#define AARCH64_ISA_SVE (aarch64_isa_flags & AARCH64_FL_SVE)
#define AARCH64_ISA_V8_3 (aarch64_isa_flags & AARCH64_FL_V8_3)
#define AARCH64_ISA_DOTPROD (aarch64_isa_flags & AARCH64_FL_DOTPROD)
#define AARCH64_ISA_AES (aarch64_isa_flags & AARCH64_FL_AES)
@@ -226,6 +228,9 @@ extern unsigned aarch64_architecture_version;
/* Dot Product is an optional extension to AdvSIMD enabled through +dotprod. */
#define TARGET_DOTPROD (TARGET_SIMD && AARCH64_ISA_DOTPROD)
+/* SVE instructions, enabled through +sve. */
+#define TARGET_SVE (AARCH64_ISA_SVE)
+
/* ARMv8.3-A features. */
#define TARGET_ARMV8_3 (AARCH64_ISA_V8_3)
@@ -286,8 +291,17 @@ extern unsigned aarch64_architecture_version;
V0-V7 Parameter/result registers
The vector register V0 holds scalar B0, H0, S0 and D0 in its least
- significant bits. Unlike AArch32 S1 is not packed into D0,
- etc. */
+ significant bits. Unlike AArch32 S1 is not packed into D0, etc.
+
+ P0-P7 Predicate low registers: valid in all predicate contexts
+ P8-P15 Predicate high registers: used as scratch space
+
+ VG Pseudo "vector granules" register
+
+ VG is the number of 64-bit elements in an SVE vector. We define
+ it as a hard register so that we can easily map it to the DWARF VG
+ register. GCC internally uses the poly_int variable aarch64_sve_vg
+ instead. */
/* Note that we don't mark X30 as a call-clobbered register. The idea is
that it's really the call instructions themselves which clobber X30.
@@ -308,7 +322,9 @@ extern unsigned aarch64_architecture_version;
0, 0, 0, 0, 0, 0, 0, 0, /* V8 - V15 */ \
0, 0, 0, 0, 0, 0, 0, 0, /* V16 - V23 */ \
0, 0, 0, 0, 0, 0, 0, 0, /* V24 - V31 */ \
- 1, 1, 1, /* SFP, AP, CC */ \
+ 1, 1, 1, 1, /* SFP, AP, CC, VG */ \
+ 0, 0, 0, 0, 0, 0, 0, 0, /* P0 - P7 */ \
+ 0, 0, 0, 0, 0, 0, 0, 0, /* P8 - P15 */ \
}
#define CALL_USED_REGISTERS \
@@ -321,7 +337,9 @@ extern unsigned aarch64_architecture_version;
0, 0, 0, 0, 0, 0, 0, 0, /* V8 - V15 */ \
1, 1, 1, 1, 1, 1, 1, 1, /* V16 - V23 */ \
1, 1, 1, 1, 1, 1, 1, 1, /* V24 - V31 */ \
- 1, 1, 1, /* SFP, AP, CC */ \
+ 1, 1, 1, 1, /* SFP, AP, CC, VG */ \
+ 1, 1, 1, 1, 1, 1, 1, 1, /* P0 - P7 */ \
+ 1, 1, 1, 1, 1, 1, 1, 1, /* P8 - P15 */ \
}
#define REGISTER_NAMES \
@@ -334,7 +352,9 @@ extern unsigned aarch64_architecture_version;
"v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", \
"v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", \
"v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", \
- "sfp", "ap", "cc", \
+ "sfp", "ap", "cc", "vg", \
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7", \
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15", \
}
/* Generate the register aliases for core register N */
@@ -345,7 +365,8 @@ extern unsigned aarch64_architecture_version;
{"d" # N, V0_REGNUM + (N)}, \
{"s" # N, V0_REGNUM + (N)}, \
{"h" # N, V0_REGNUM + (N)}, \
- {"b" # N, V0_REGNUM + (N)}
+ {"b" # N, V0_REGNUM + (N)}, \
+ {"z" # N, V0_REGNUM + (N)}
/* Provide aliases for all of the ISA defined register name forms.
These aliases are convenient for use in the clobber lists of inline
@@ -387,7 +408,7 @@ extern unsigned aarch64_architecture_version;
#define FRAME_POINTER_REGNUM SFP_REGNUM
#define STACK_POINTER_REGNUM SP_REGNUM
#define ARG_POINTER_REGNUM AP_REGNUM
-#define FIRST_PSEUDO_REGISTER 67
+#define FIRST_PSEUDO_REGISTER (P15_REGNUM + 1)
/* The number of (integer) argument register available. */
#define NUM_ARG_REGS 8
@@ -408,6 +429,8 @@ extern unsigned aarch64_architecture_version;
#define AARCH64_DWARF_NUMBER_R 31
#define AARCH64_DWARF_SP 31
+#define AARCH64_DWARF_VG 46
+#define AARCH64_DWARF_P0 48
#define AARCH64_DWARF_V0 64
/* The number of V registers. */
@@ -472,6 +495,12 @@ extern unsigned aarch64_architecture_version;
#define FP_LO_REGNUM_P(REGNO) \
(((unsigned) (REGNO - V0_REGNUM)) <= (V15_REGNUM - V0_REGNUM))
+#define PR_REGNUM_P(REGNO)\
+ (((unsigned) (REGNO - P0_REGNUM)) <= (P15_REGNUM - P0_REGNUM))
+
+#define PR_LO_REGNUM_P(REGNO)\
+ (((unsigned) (REGNO - P0_REGNUM)) <= (P7_REGNUM - P0_REGNUM))
+
/* Register and constant classes. */
@@ -485,6 +514,9 @@ enum reg_class
FP_LO_REGS,
FP_REGS,
POINTER_AND_FP_REGS,
+ PR_LO_REGS,
+ PR_HI_REGS,
+ PR_REGS,
ALL_REGS,
LIM_REG_CLASSES /* Last */
};
@@ -501,6 +533,9 @@ enum reg_class
"FP_LO_REGS", \
"FP_REGS", \
"POINTER_AND_FP_REGS", \
+ "PR_LO_REGS", \
+ "PR_HI_REGS", \
+ "PR_REGS", \
"ALL_REGS" \
}
@@ -514,7 +549,10 @@ enum reg_class
{ 0x00000000, 0x0000ffff, 0x00000000 }, /* FP_LO_REGS */ \
{ 0x00000000, 0xffffffff, 0x00000000 }, /* FP_REGS */ \
{ 0xffffffff, 0xffffffff, 0x00000003 }, /* POINTER_AND_FP_REGS */\
- { 0xffffffff, 0xffffffff, 0x00000007 } /* ALL_REGS */ \
+ { 0x00000000, 0x00000000, 0x00000ff0 }, /* PR_LO_REGS */ \
+ { 0x00000000, 0x00000000, 0x000ff000 }, /* PR_HI_REGS */ \
+ { 0x00000000, 0x00000000, 0x000ffff0 }, /* PR_REGS */ \
+ { 0xffffffff, 0xffffffff, 0x000fffff } /* ALL_REGS */ \
}
#define REGNO_REG_CLASS(REGNO) aarch64_regno_regclass (REGNO)
@@ -998,4 +1036,28 @@ extern tree aarch64_fp16_ptr_type_node;
#define LIBGCC2_UNWIND_ATTRIBUTE \
__attribute__((optimize ("no-omit-frame-pointer")))
+#ifndef USED_FOR_TARGET
+extern poly_uint16 aarch64_sve_vg;
+
+/* The number of bits and bytes in an SVE vector. */
+#define BITS_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 64))
+#define BYTES_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 8))
+
+/* The number of bytes in an SVE predicate. */
+#define BYTES_PER_SVE_PRED aarch64_sve_vg
+
+/* The SVE mode for a vector of bytes. */
+#define SVE_BYTE_MODE VNx16QImode
+
+/* The maximum number of bytes in a fixed-size vector. This is 256 bytes
+ (for -msve-vector-bits=2048) multiplied by the maximum number of
+ vectors in a structure mode (4).
+
+ This limit must not be used for variable-size vectors, since
+ VL-agnostic code must work with arbitary vector lengths. */
+#define MAX_COMPILE_TIME_VEC_BYTES (256 * 4)
+#endif
+
+#define REGMODE_NATURAL_SIZE(MODE) aarch64_regmode_natural_size (MODE)
+
#endif /* GCC_AARCH64_H */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 854c448..728136a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -63,6 +63,11 @@
(SFP_REGNUM 64)
(AP_REGNUM 65)
(CC_REGNUM 66)
+ ;; Defined only to make the DWARF description simpler.
+ (VG_REGNUM 67)
+ (P0_REGNUM 68)
+ (P7_REGNUM 75)
+ (P15_REGNUM 83)
]
)
@@ -114,6 +119,7 @@
UNSPEC_PACI1716
UNSPEC_PACISP
UNSPEC_PRLG_STK
+ UNSPEC_REV
UNSPEC_RBIT
UNSPEC_SCVTF
UNSPEC_SISD_NEG
@@ -143,6 +149,18 @@
UNSPEC_RSQRTS
UNSPEC_NZCV
UNSPEC_XPACLRI
+ UNSPEC_LD1_SVE
+ UNSPEC_ST1_SVE
+ UNSPEC_LD1RQ
+ UNSPEC_MERGE_PTRUE
+ UNSPEC_PTEST_PTRUE
+ UNSPEC_UNPACKSHI
+ UNSPEC_UNPACKUHI
+ UNSPEC_UNPACKSLO
+ UNSPEC_UNPACKULO
+ UNSPEC_PACK
+ UNSPEC_FLOAT_CONVERT
+ UNSPEC_WHILE_LO
])
(define_c_enum "unspecv" [
@@ -194,6 +212,11 @@
;; will be disabled when !TARGET_SIMD.
(define_attr "simd" "no,yes" (const_string "no"))
+;; Attribute that specifies whether or not the instruction uses SVE.
+;; When this is set to yes for an alternative, that alternative
+;; will be disabled when !TARGET_SVE.
+(define_attr "sve" "no,yes" (const_string "no"))
+
(define_attr "length" ""
(const_int 4))
@@ -202,13 +225,14 @@
;; registers when -mgeneral-regs-only is specified.
(define_attr "enabled" "no,yes"
(cond [(ior
- (ior
- (and (eq_attr "fp" "yes")
- (eq (symbol_ref "TARGET_FLOAT") (const_int 0)))
- (and (eq_attr "simd" "yes")
- (eq (symbol_ref "TARGET_SIMD") (const_int 0))))
+ (and (eq_attr "fp" "yes")
+ (eq (symbol_ref "TARGET_FLOAT") (const_int 0)))
+ (and (eq_attr "simd" "yes")
+ (eq (symbol_ref "TARGET_SIMD") (const_int 0)))
(and (eq_attr "fp16" "yes")
- (eq (symbol_ref "TARGET_FP_F16INST") (const_int 0))))
+ (eq (symbol_ref "TARGET_FP_F16INST") (const_int 0)))
+ (and (eq_attr "sve" "yes")
+ (eq (symbol_ref "TARGET_SVE") (const_int 0))))
(const_string "no")
] (const_string "yes")))
@@ -866,12 +890,18 @@
"
if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx)
operands[1] = force_reg (<MODE>mode, operands[1]);
+
+ if (GET_CODE (operands[1]) == CONST_POLY_INT)
+ {
+ aarch64_expand_mov_immediate (operands[0], operands[1]);
+ DONE;
+ }
"
)
(define_insn "*mov<mode>_aarch64"
- [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r, *w,r,*w, m, m, r,*w,*w")
- (match_operand:SHORT 1 "general_operand" " r,M,D<hq>,m, m,rZ,*w,*w, r,*w"))]
+ [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r, *w,r ,r,*w, m, m, r,*w,*w")
+ (match_operand:SHORT 1 "aarch64_mov_operand" " r,M,D<hq>,Usv,m, m,rZ,*w,*w, r,*w"))]
"(register_operand (operands[0], <MODE>mode)
|| aarch64_reg_or_zero (operands[1], <MODE>mode))"
{
@@ -885,26 +915,30 @@
return aarch64_output_scalar_simd_mov_immediate (operands[1],
<MODE>mode);
case 3:
- return "ldr<size>\t%w0, %1";
+ return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
case 4:
- return "ldr\t%<size>0, %1";
+ return "ldr<size>\t%w0, %1";
case 5:
- return "str<size>\t%w1, %0";
+ return "ldr\t%<size>0, %1";
case 6:
- return "str\t%<size>1, %0";
+ return "str<size>\t%w1, %0";
case 7:
- return "umov\t%w0, %1.<v>[0]";
+ return "str\t%<size>1, %0";
case 8:
- return "dup\t%0.<Vallxd>, %w1";
+ return "umov\t%w0, %1.<v>[0]";
case 9:
+ return "dup\t%0.<Vallxd>, %w1";
+ case 10:
return "dup\t%<Vetype>0, %1.<v>[0]";
default:
gcc_unreachable ();
}
}
- [(set_attr "type" "mov_reg,mov_imm,neon_move,load_4,load_4,store_4,store_4,\
- neon_to_gp<q>,neon_from_gp<q>,neon_dup")
- (set_attr "simd" "*,*,yes,*,*,*,*,yes,yes,yes")]
+ ;; The "mov_imm" type for CNT is just a placeholder.
+ [(set_attr "type" "mov_reg,mov_imm,neon_move,mov_imm,load_4,load_4,store_4,
+ store_4,neon_to_gp<q>,neon_from_gp<q>,neon_dup")
+ (set_attr "simd" "*,*,yes,*,*,*,*,*,yes,yes,yes")
+ (set_attr "sve" "*,*,*,yes,*,*,*,*,*,*,*")]
)
(define_expand "mov<mode>"
@@ -932,8 +966,8 @@
)
(define_insn_and_split "*movsi_aarch64"
- [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,w, m, m, r, r, w,r,w, w")
- (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,m,m,rZ,*w,Usa,Ush,rZ,w,w,Ds"))]
+ [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m, r, r, w,r,w, w")
+ (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,Usv,m,m,rZ,*w,Usa,Ush,rZ,w,w,Ds"))]
"(register_operand (operands[0], SImode)
|| aarch64_reg_or_zero (operands[1], SImode))"
"@
@@ -942,6 +976,7 @@
mov\\t%w0, %w1
mov\\t%w0, %1
#
+ * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
ldr\\t%w0, %1
ldr\\t%s0, %1
str\\t%w1, %0
@@ -959,15 +994,17 @@
aarch64_expand_mov_immediate (operands[0], operands[1]);
DONE;
}"
- [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,load_4,load_4,store_4,store_4,\
- adr,adr,f_mcr,f_mrc,fmov,neon_move")
- (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
- (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
+ ;; The "mov_imm" type for CNT is just a placeholder.
+ [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,
+ load_4,store_4,store_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
+ (set_attr "fp" "*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
+ (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")
+ (set_attr "sve" "*,*,*,*,*,yes,*,*,*,*,*,*,*,*,*,*")]
)
(define_insn_and_split "*movdi_aarch64"
- [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,r,w, m,m, r, r, w,r,w, w")
- (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,N,M,n,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))]
+ [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,r, r,w, m,m, r, r, w,r,w, w")
+ (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,N,M,n,Usv,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))]
"(register_operand (operands[0], DImode)
|| aarch64_reg_or_zero (operands[1], DImode))"
"@
@@ -977,6 +1014,7 @@
mov\\t%x0, %1
mov\\t%w0, %1
#
+ * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
ldr\\t%x0, %1
ldr\\t%d0, %1
str\\t%x1, %0
@@ -994,10 +1032,13 @@
aarch64_expand_mov_immediate (operands[0], operands[1]);
DONE;
}"
- [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_8,\
- load_8,store_8,store_8,adr,adr,f_mcr,f_mrc,fmov,neon_move")
- (set_attr "fp" "*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
- (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
+ ;; The "mov_imm" type for CNTD is just a placeholder.
+ [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,mov_imm,
+ load_8,load_8,store_8,store_8,adr,adr,f_mcr,f_mrc,fmov,
+ neon_move")
+ (set_attr "fp" "*,*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
+ (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")
+ (set_attr "sve" "*,*,*,*,*,*,yes,*,*,*,*,*,*,*,*,*,*")]
)
(define_insn "insv_imm<mode>"
@@ -1018,6 +1059,14 @@
"
if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx)
operands[1] = force_reg (TImode, operands[1]);
+
+ if (GET_CODE (operands[1]) == CONST_POLY_INT)
+ {
+ emit_move_insn (gen_lowpart (DImode, operands[0]),
+ gen_lowpart (DImode, operands[1]));
+ emit_move_insn (gen_highpart (DImode, operands[0]), const0_rtx);
+ DONE;
+ }
"
)
@@ -1542,7 +1591,7 @@
[(set
(match_operand:GPI 0 "register_operand" "")
(plus:GPI (match_operand:GPI 1 "register_operand" "")
- (match_operand:GPI 2 "aarch64_pluslong_operand" "")))]
+ (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" "")))]
""
{
/* If operands[1] is a subreg extract the inner RTX. */
@@ -1555,23 +1604,34 @@
&& (!REG_P (op1)
|| !REGNO_PTR_FRAME_P (REGNO (op1))))
operands[2] = force_reg (<MODE>mode, operands[2]);
+ /* Expand polynomial additions now if the destination is the stack
+ pointer, since we don't want to use that as a temporary. */
+ else if (operands[0] == stack_pointer_rtx
+ && aarch64_split_add_offset_immediate (operands[2], <MODE>mode))
+ {
+ aarch64_split_add_offset (<MODE>mode, operands[0], operands[1],
+ operands[2], NULL_RTX, NULL_RTX);
+ DONE;
+ }
})
(define_insn "*add<mode>3_aarch64"
[(set
- (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r")
+ (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r,rk")
(plus:GPI
- (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk")
- (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa")))]
+ (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk,rk")
+ (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa,Uav")))]
""
"@
add\\t%<w>0, %<w>1, %2
add\\t%<w>0, %<w>1, %<w>2
add\\t%<rtn>0<vas>, %<rtn>1<vas>, %<rtn>2<vas>
sub\\t%<w>0, %<w>1, #%n2
- #"
- [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple")
- (set_attr "simd" "*,*,yes,*,*")]
+ #
+ * return aarch64_output_sve_addvl_addpl (operands[0], operands[1], operands[2]);"
+ ;; The "alu_imm" type for ADDVL/ADDPL is just a placeholder.
+ [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple,alu_imm")
+ (set_attr "simd" "*,*,yes,*,*,*")]
)
;; zero_extend version of above
@@ -1633,6 +1693,48 @@
}
)
+;; Match addition of polynomial offsets that require one temporary, for which
+;; we can use the early-clobbered destination register. This is a separate
+;; pattern so that the early clobber doesn't affect register allocation
+;; for other forms of addition. However, we still need to provide an
+;; all-register alternative, in case the offset goes out of range after
+;; elimination. For completeness we might as well provide all GPR-based
+;; alternatives from the main pattern.
+;;
+;; We don't have a pattern for additions requiring two temporaries since at
+;; present LRA doesn't allow new scratches to be added during elimination.
+;; Such offsets should be rare anyway.
+;;
+;; ??? But if we added LRA support for new scratches, much of the ugliness
+;; here would go away. We could just handle all polynomial constants in
+;; this pattern.
+(define_insn_and_split "*add<mode>3_poly_1"
+ [(set
+ (match_operand:GPI 0 "register_operand" "=r,r,r,r,r,&r")
+ (plus:GPI
+ (match_operand:GPI 1 "register_operand" "%rk,rk,rk,rk,rk,rk")
+ (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" "I,r,J,Uaa,Uav,Uat")))]
+ "TARGET_SVE && operands[0] != stack_pointer_rtx"
+ "@
+ add\\t%<w>0, %<w>1, %2
+ add\\t%<w>0, %<w>1, %<w>2
+ sub\\t%<w>0, %<w>1, #%n2
+ #
+ * return aarch64_output_sve_addvl_addpl (operands[0], operands[1], operands[2]);
+ #"
+ "&& epilogue_completed
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && aarch64_split_add_offset_immediate (operands[2], <MODE>mode)"
+ [(const_int 0)]
+ {
+ aarch64_split_add_offset (<MODE>mode, operands[0], operands[1],
+ operands[2], operands[0], NULL_RTX);
+ DONE;
+ }
+ ;; The "alu_imm" type for ADDVL/ADDPL is just a placeholder.
+ [(set_attr "type" "alu_imm,alu_sreg,alu_imm,multiple,alu_imm,multiple")]
+)
+
(define_split
[(set (match_operand:DI 0 "register_operand")
(zero_extend:DI
@@ -5797,6 +5899,12 @@
DONE;
})
+;; Helper for aarch64.c code.
+(define_expand "set_clobber_cc"
+ [(parallel [(set (match_operand 0)
+ (match_operand 1))
+ (clobber (reg:CC CC_REGNUM))])])
+
;; AdvSIMD Stuff
(include "aarch64-simd.md")
@@ -5805,3 +5913,6 @@
;; ldp/stp peephole patterns
(include "aarch64-ldpstp.md")
+
+;; SVE.
+(include "aarch64-sve.md")
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 18bf0e3..52eaf8c 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -185,6 +185,32 @@ Enable the division approximation. Enabling this reduces
precision of division results to about 16 bits for
single precision and to 32 bits for double precision.
+Enum
+Name(sve_vector_bits) Type(enum aarch64_sve_vector_bits_enum)
+The possible SVE vector lengths:
+
+EnumValue
+Enum(sve_vector_bits) String(scalable) Value(SVE_SCALABLE)
+
+EnumValue
+Enum(sve_vector_bits) String(128) Value(SVE_128)
+
+EnumValue
+Enum(sve_vector_bits) String(256) Value(SVE_256)
+
+EnumValue
+Enum(sve_vector_bits) String(512) Value(SVE_512)
+
+EnumValue
+Enum(sve_vector_bits) String(1024) Value(SVE_1024)
+
+EnumValue
+Enum(sve_vector_bits) String(2048) Value(SVE_2048)
+
+msve-vector-bits=
+Target RejectNegative Joined Enum(sve_vector_bits) Var(aarch64_sve_vector_bits) Init(SVE_SCALABLE)
+-msve-vector-bits=N Set the number of bits in an SVE vector register to N.
+
mverbose-cost-dump
Common Undocumented Var(flag_aarch64_verbose_cost)
Enables verbose cost model dumping in the debug dump files.
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 18adbc6..b004f78 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -27,6 +27,12 @@
(define_register_constraint "w" "FP_REGS"
"Floating point and SIMD vector registers.")
+(define_register_constraint "Upa" "PR_REGS"
+ "SVE predicate registers p0 - p15.")
+
+(define_register_constraint "Upl" "PR_LO_REGS"
+ "SVE predicate registers p0 - p7.")
+
(define_register_constraint "x" "FP_LO_REGS"
"Floating point and SIMD vector registers V0 - V15.")
@@ -40,6 +46,18 @@
(and (match_code "const_int")
(match_test "aarch64_pluslong_strict_immedate (op, VOIDmode)")))
+(define_constraint "Uav"
+ "@internal
+ A constraint that matches a VG-based constant that can be added by
+ a single ADDVL or ADDPL."
+ (match_operand 0 "aarch64_sve_addvl_addpl_immediate"))
+
+(define_constraint "Uat"
+ "@internal
+ A constraint that matches a VG-based constant that can be added by
+ using multiple instructions, with one temporary register."
+ (match_operand 0 "aarch64_split_add_offset_immediate"))
+
(define_constraint "J"
"A constant that can be used with a SUB operation (once negated)."
(and (match_code "const_int")
@@ -134,6 +152,18 @@
A constraint that matches the immediate constant -1."
(match_test "op == constm1_rtx"))
+(define_constraint "Usv"
+ "@internal
+ A constraint that matches a VG-based constant that can be loaded by
+ a single CNT[BHWD]."
+ (match_operand 0 "aarch64_sve_cnt_immediate"))
+
+(define_constraint "Usi"
+ "@internal
+ A constraint that matches an immediate operand valid for
+ the SVE INDEX instruction."
+ (match_operand 0 "aarch64_sve_index_immediate"))
+
(define_constraint "Ui1"
"@internal
A constraint that matches the immediate constant +1."
@@ -192,6 +222,13 @@
(match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0), 1,
ADDR_QUERY_LDP_STP)")))
+(define_memory_constraint "Utr"
+ "@internal
+ An address valid for SVE LDR and STR instructions (as distinct from
+ LD[1234] and ST[1234] patterns)."
+ (and (match_code "mem")
+ (match_test "aarch64_sve_ldr_operand_p (op)")))
+
(define_memory_constraint "Utv"
"@internal
An address valid for loading/storing opaque structure
@@ -206,6 +243,12 @@
(match_test "aarch64_legitimate_address_p (V2DImode,
XEXP (op, 0), 1)")))
+(define_memory_constraint "Uty"
+ "@internal
+ An address valid for SVE LD1Rs."
+ (and (match_code "mem")
+ (match_test "aarch64_sve_ld1r_operand_p (op)")))
+
(define_constraint "Ufc"
"A floating point constant which can be used with an\
FMOV immediate operation."
@@ -235,7 +278,7 @@
(define_constraint "Dn"
"@internal
A constraint that matches vector of immediates."
- (and (match_code "const_vector")
+ (and (match_code "const,const_vector")
(match_test "aarch64_simd_valid_immediate (op, NULL)")))
(define_constraint "Dh"
@@ -257,21 +300,27 @@
(define_constraint "Dl"
"@internal
A constraint that matches vector of immediates for left shifts."
- (and (match_code "const_vector")
+ (and (match_code "const,const_vector")
(match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
true)")))
(define_constraint "Dr"
"@internal
A constraint that matches vector of immediates for right shifts."
- (and (match_code "const_vector")
+ (and (match_code "const,const_vector")
(match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
false)")))
(define_constraint "Dz"
"@internal
- A constraint that matches vector of immediate zero."
- (and (match_code "const_vector")
- (match_test "aarch64_simd_imm_zero_p (op, GET_MODE (op))")))
+ A constraint that matches a vector of immediate zero."
+ (and (match_code "const,const_vector")
+ (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_constraint "Dm"
+ "@internal
+ A constraint that matches a vector of immediate minus one."
+ (and (match_code "const,const_vector")
+ (match_test "op == CONST1_RTX (GET_MODE (op))")))
(define_constraint "Dd"
"@internal
@@ -291,3 +340,62 @@
"@internal
An address valid for a prefetch instruction."
(match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
+
+(define_constraint "vsa"
+ "@internal
+ A constraint that matches an immediate operand valid for SVE
+ arithmetic instructions."
+ (match_operand 0 "aarch64_sve_arith_immediate"))
+
+(define_constraint "vsc"
+ "@internal
+ A constraint that matches a signed immediate operand valid for SVE
+ CMP instructions."
+ (match_operand 0 "aarch64_sve_cmp_vsc_immediate"))
+
+(define_constraint "vsd"
+ "@internal
+ A constraint that matches an unsigned immediate operand valid for SVE
+ CMP instructions."
+ (match_operand 0 "aarch64_sve_cmp_vsd_immediate"))
+
+(define_constraint "vsi"
+ "@internal
+ A constraint that matches a vector count operand valid for SVE INC and
+ DEC instructions."
+ (match_operand 0 "aarch64_sve_inc_dec_immediate"))
+
+(define_constraint "vsn"
+ "@internal
+ A constraint that matches an immediate operand whose negative
+ is valid for SVE SUB instructions."
+ (match_operand 0 "aarch64_sve_sub_arith_immediate"))
+
+(define_constraint "vsl"
+ "@internal
+ A constraint that matches an immediate operand valid for SVE logical
+ operations."
+ (match_operand 0 "aarch64_sve_logical_immediate"))
+
+(define_constraint "vsm"
+ "@internal
+ A constraint that matches an immediate operand valid for SVE MUL
+ operations."
+ (match_operand 0 "aarch64_sve_mul_immediate"))
+
+(define_constraint "vsA"
+ "@internal
+ A constraint that matches an immediate operand valid for SVE FADD
+ and FSUB operations."
+ (match_operand 0 "aarch64_sve_float_arith_immediate"))
+
+(define_constraint "vsM"
+ "@internal
+ A constraint that matches an imediate operand valid for SVE FMUL
+ operations."
+ (match_operand 0 "aarch64_sve_float_mul_immediate"))
+
+(define_constraint "vsN"
+ "@internal
+ A constraint that matches the negative of vsA"
+ (match_operand 0 "aarch64_sve_float_arith_with_sub_immediate"))
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index e199dfd..0fe42ed 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -56,20 +56,20 @@
;; Iterator for all scalar floating point modes (SF, DF and TF)
(define_mode_iterator GPF_TF [SF DF TF])
-;; Integer vector modes.
+;; Integer Advanced SIMD modes.
(define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI])
-;; vector and scalar, 64 & 128-bit container, all integer modes
+;; Advanced SIMD and scalar, 64 & 128-bit container, all integer modes.
(define_mode_iterator VSDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI QI HI SI DI])
-;; vector and scalar, 64 & 128-bit container: all vector integer modes;
-;; 64-bit scalar integer mode
+;; Advanced SIMD and scalar, 64 & 128-bit container: all Advanced SIMD
+;; integer modes; 64-bit scalar integer mode.
(define_mode_iterator VSDQ_I_DI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI DI])
;; Double vector modes.
(define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF])
-;; vector, 64-bit container, all integer modes
+;; Advanced SIMD, 64-bit container, all integer modes.
(define_mode_iterator VD_BHSI [V8QI V4HI V2SI])
;; 128 and 64-bit container; 8, 16, 32-bit vector integer modes
@@ -94,16 +94,16 @@
;; pointer-sized quantities. Exactly one of the two alternatives will match.
(define_mode_iterator PTR [(SI "ptr_mode == SImode") (DI "ptr_mode == DImode")])
-;; Vector Float modes suitable for moving, loading and storing.
+;; Advanced SIMD Float modes suitable for moving, loading and storing.
(define_mode_iterator VDQF_F16 [V4HF V8HF V2SF V4SF V2DF])
-;; Vector Float modes.
+;; Advanced SIMD Float modes.
(define_mode_iterator VDQF [V2SF V4SF V2DF])
(define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST")
(V8HF "TARGET_SIMD_F16INST")
V2SF V4SF V2DF])
-;; Vector Float modes, and DF.
+;; Advanced SIMD Float modes, and DF.
(define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST")
(V8HF "TARGET_SIMD_F16INST")
V2SF V4SF V2DF DF])
@@ -113,7 +113,7 @@
(HF "TARGET_SIMD_F16INST")
SF DF])
-;; Vector single Float modes.
+;; Advanced SIMD single Float modes.
(define_mode_iterator VDQSF [V2SF V4SF])
;; Quad vector Float modes with half/single elements.
@@ -122,16 +122,16 @@
;; Modes suitable to use as the return type of a vcond expression.
(define_mode_iterator VDQF_COND [V2SF V2SI V4SF V4SI V2DF V2DI])
-;; All Float modes.
+;; All scalar and Advanced SIMD Float modes.
(define_mode_iterator VALLF [V2SF V4SF V2DF SF DF])
-;; Vector Float modes with 2 elements.
+;; Advanced SIMD Float modes with 2 elements.
(define_mode_iterator V2F [V2SF V2DF])
-;; All vector modes on which we support any arithmetic operations.
+;; All Advanced SIMD modes on which we support any arithmetic operations.
(define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF])
-;; All vector modes suitable for moving, loading, and storing.
+;; All Advanced SIMD modes suitable for moving, loading, and storing.
(define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
V4HF V8HF V2SF V4SF V2DF])
@@ -139,21 +139,21 @@
(define_mode_iterator VALL_F16_NO_V2Q [V8QI V16QI V4HI V8HI V2SI V4SI
V4HF V8HF V2SF V4SF])
-;; All vector modes barring HF modes, plus DI.
+;; All Advanced SIMD modes barring HF modes, plus DI.
(define_mode_iterator VALLDI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF DI])
-;; All vector modes and DI.
+;; All Advanced SIMD modes and DI.
(define_mode_iterator VALLDI_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
V4HF V8HF V2SF V4SF V2DF DI])
-;; All vector modes, plus DI and DF.
+;; All Advanced SIMD modes, plus DI and DF.
(define_mode_iterator VALLDIF [V8QI V16QI V4HI V8HI V2SI V4SI
V2DI V4HF V8HF V2SF V4SF V2DF DI DF])
-;; Vector modes for Integer reduction across lanes.
+;; Advanced SIMD modes for Integer reduction across lanes.
(define_mode_iterator VDQV [V8QI V16QI V4HI V8HI V4SI V2DI])
-;; Vector modes(except V2DI) for Integer reduction across lanes.
+;; Advanced SIMD modes (except V2DI) for Integer reduction across lanes.
(define_mode_iterator VDQV_S [V8QI V16QI V4HI V8HI V4SI])
;; All double integer narrow-able modes.
@@ -162,7 +162,8 @@
;; All quad integer narrow-able modes.
(define_mode_iterator VQN [V8HI V4SI V2DI])
-;; Vector and scalar 128-bit container: narrowable 16, 32, 64-bit integer modes
+;; Advanced SIMD and scalar 128-bit container: narrowable 16, 32, 64-bit
+;; integer modes
(define_mode_iterator VSQN_HSDI [V8HI V4SI V2DI HI SI DI])
;; All quad integer widen-able modes.
@@ -171,54 +172,54 @@
;; Double vector modes for combines.
(define_mode_iterator VDC [V8QI V4HI V4HF V2SI V2SF DI DF])
-;; Vector modes except double int.
+;; Advanced SIMD modes except double int.
(define_mode_iterator VDQIF [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
(define_mode_iterator VDQIF_F16 [V8QI V16QI V4HI V8HI V2SI V4SI
V4HF V8HF V2SF V4SF V2DF])
-;; Vector modes for S type.
+;; Advanced SIMD modes for S type.
(define_mode_iterator VDQ_SI [V2SI V4SI])
-;; Vector modes for S and D
+;; Advanced SIMD modes for S and D.
(define_mode_iterator VDQ_SDI [V2SI V4SI V2DI])
-;; Vector modes for H, S and D
+;; Advanced SIMD modes for H, S and D.
(define_mode_iterator VDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
(V8HI "TARGET_SIMD_F16INST")
V2SI V4SI V2DI])
-;; Scalar and Vector modes for S and D
+;; Scalar and Advanced SIMD modes for S and D.
(define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI])
-;; Scalar and Vector modes for S and D, Vector modes for H.
+;; Scalar and Advanced SIMD modes for S and D, Advanced SIMD modes for H.
(define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
(V8HI "TARGET_SIMD_F16INST")
V2SI V4SI V2DI
(HI "TARGET_SIMD_F16INST")
SI DI])
-;; Vector modes for Q and H types.
+;; Advanced SIMD modes for Q and H types.
(define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
-;; Vector modes for H and S types.
+;; Advanced SIMD modes for H and S types.
(define_mode_iterator VDQHS [V4HI V8HI V2SI V4SI])
-;; Vector modes for H, S and D types.
+;; Advanced SIMD modes for H, S and D types.
(define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI])
-;; Vector and scalar integer modes for H and S
+;; Advanced SIMD and scalar integer modes for H and S.
(define_mode_iterator VSDQ_HSI [V4HI V8HI V2SI V4SI HI SI])
-;; Vector and scalar 64-bit container: 16, 32-bit integer modes
+;; Advanced SIMD and scalar 64-bit container: 16, 32-bit integer modes.
(define_mode_iterator VSD_HSI [V4HI V2SI HI SI])
-;; Vector 64-bit container: 16, 32-bit integer modes
+;; Advanced SIMD 64-bit container: 16, 32-bit integer modes.
(define_mode_iterator VD_HSI [V4HI V2SI])
;; Scalar 64-bit container: 16, 32-bit integer modes
(define_mode_iterator SD_HSI [HI SI])
-;; Vector 64-bit container: 16, 32-bit integer modes
+;; Advanced SIMD 64-bit container: 16, 32-bit integer modes.
(define_mode_iterator VQ_HSI [V8HI V4SI])
;; All byte modes.
@@ -229,21 +230,59 @@
(define_mode_iterator TX [TI TF])
-;; Opaque structure modes.
+;; Advanced SIMD opaque structure modes.
(define_mode_iterator VSTRUCT [OI CI XI])
;; Double scalar modes
(define_mode_iterator DX [DI DF])
-;; Modes available for <f>mul lane operations.
+;; Modes available for Advanced SIMD <f>mul lane operations.
(define_mode_iterator VMUL [V4HI V8HI V2SI V4SI
(V4HF "TARGET_SIMD_F16INST")
(V8HF "TARGET_SIMD_F16INST")
V2SF V4SF V2DF])
-;; Modes available for <f>mul lane operations changing lane count.
+;; Modes available for Advanced SIMD <f>mul lane operations changing lane
+;; count.
(define_mode_iterator VMUL_CHANGE_NLANES [V4HI V8HI V2SI V4SI V2SF V4SF])
+;; All SVE vector modes.
+(define_mode_iterator SVE_ALL [VNx16QI VNx8HI VNx4SI VNx2DI
+ VNx8HF VNx4SF VNx2DF])
+
+;; All SVE vector modes that have 8-bit or 16-bit elements.
+(define_mode_iterator SVE_BH [VNx16QI VNx8HI VNx8HF])
+
+;; All SVE vector modes that have 8-bit, 16-bit or 32-bit elements.
+(define_mode_iterator SVE_BHS [VNx16QI VNx8HI VNx4SI VNx8HF VNx4SF])
+
+;; All SVE integer vector modes that have 8-bit, 16-bit or 32-bit elements.
+(define_mode_iterator SVE_BHSI [VNx16QI VNx8HI VNx4SI])
+
+;; All SVE integer vector modes that have 16-bit, 32-bit or 64-bit elements.
+(define_mode_iterator SVE_HSDI [VNx16QI VNx8HI VNx4SI])
+
+;; All SVE floating-point vector modes that have 16-bit or 32-bit elements.
+(define_mode_iterator SVE_HSF [VNx8HF VNx4SF])
+
+;; All SVE vector modes that have 32-bit or 64-bit elements.
+(define_mode_iterator SVE_SD [VNx4SI VNx2DI VNx4SF VNx2DF])
+
+;; All SVE integer vector modes that have 32-bit or 64-bit elements.
+(define_mode_iterator SVE_SDI [VNx4SI VNx2DI])
+
+;; All SVE integer vector modes.
+(define_mode_iterator SVE_I [VNx16QI VNx8HI VNx4SI VNx2DI])
+
+;; All SVE floating-point vector modes.
+(define_mode_iterator SVE_F [VNx8HF VNx4SF VNx2DF])
+
+;; All SVE predicate modes.
+(define_mode_iterator PRED_ALL [VNx16BI VNx8BI VNx4BI VNx2BI])
+
+;; SVE predicate modes that control 8-bit, 16-bit or 32-bit elements.
+(define_mode_iterator PRED_BHS [VNx16BI VNx8BI VNx4BI])
+
;; ------------------------------------------------------------------
;; Unspec enumerations for Advance SIMD. These could well go into
;; aarch64.md but for their use in int_iterators here.
@@ -378,6 +417,22 @@
UNSPEC_FMLSL ; Used in aarch64-simd.md.
UNSPEC_FMLAL2 ; Used in aarch64-simd.md.
UNSPEC_FMLSL2 ; Used in aarch64-simd.md.
+ UNSPEC_SEL ; Used in aarch64-sve.md.
+ UNSPEC_ANDF ; Used in aarch64-sve.md.
+ UNSPEC_IORF ; Used in aarch64-sve.md.
+ UNSPEC_XORF ; Used in aarch64-sve.md.
+ UNSPEC_COND_LT ; Used in aarch64-sve.md.
+ UNSPEC_COND_LE ; Used in aarch64-sve.md.
+ UNSPEC_COND_EQ ; Used in aarch64-sve.md.
+ UNSPEC_COND_NE ; Used in aarch64-sve.md.
+ UNSPEC_COND_GE ; Used in aarch64-sve.md.
+ UNSPEC_COND_GT ; Used in aarch64-sve.md.
+ UNSPEC_COND_LO ; Used in aarch64-sve.md.
+ UNSPEC_COND_LS ; Used in aarch64-sve.md.
+ UNSPEC_COND_HS ; Used in aarch64-sve.md.
+ UNSPEC_COND_HI ; Used in aarch64-sve.md.
+ UNSPEC_COND_UO ; Used in aarch64-sve.md.
+ UNSPEC_LASTB ; Used in aarch64-sve.md.
])
;; ------------------------------------------------------------------
@@ -535,17 +590,24 @@
(HI "")])
;; Mode-to-individual element type mapping.
-(define_mode_attr Vetype [(V8QI "b") (V16QI "b")
- (V4HI "h") (V8HI "h")
- (V2SI "s") (V4SI "s")
- (V2DI "d") (V4HF "h")
- (V8HF "h") (V2SF "s")
- (V4SF "s") (V2DF "d")
+(define_mode_attr Vetype [(V8QI "b") (V16QI "b") (VNx16QI "b") (VNx16BI "b")
+ (V4HI "h") (V8HI "h") (VNx8HI "h") (VNx8BI "h")
+ (V2SI "s") (V4SI "s") (VNx4SI "s") (VNx4BI "s")
+ (V2DI "d") (VNx2DI "d") (VNx2BI "d")
+ (V4HF "h") (V8HF "h") (VNx8HF "h")
+ (V2SF "s") (V4SF "s") (VNx4SF "s")
+ (V2DF "d") (VNx2DF "d")
(HF "h")
(SF "s") (DF "d")
(QI "b") (HI "h")
(SI "s") (DI "d")])
+;; Equivalent of "size" for a vector element.
+(define_mode_attr Vesize [(VNx16QI "b")
+ (VNx8HI "h") (VNx8HF "h")
+ (VNx4SI "w") (VNx4SF "w")
+ (VNx2DI "d") (VNx2DF "d")])
+
;; Vetype is used everywhere in scheduling type and assembly output,
;; sometimes they are not the same, for example HF modes on some
;; instructions. stype is defined to represent scheduling type
@@ -567,27 +629,45 @@
(SI "8b")])
;; Define element mode for each vector mode.
-(define_mode_attr VEL [(V8QI "QI") (V16QI "QI")
- (V4HI "HI") (V8HI "HI")
- (V2SI "SI") (V4SI "SI")
- (DI "DI") (V2DI "DI")
- (V4HF "HF") (V8HF "HF")
- (V2SF "SF") (V4SF "SF")
- (V2DF "DF") (DF "DF")
- (SI "SI") (HI "HI")
+(define_mode_attr VEL [(V8QI "QI") (V16QI "QI") (VNx16QI "QI")
+ (V4HI "HI") (V8HI "HI") (VNx8HI "HI")
+ (V2SI "SI") (V4SI "SI") (VNx4SI "SI")
+ (DI "DI") (V2DI "DI") (VNx2DI "DI")
+ (V4HF "HF") (V8HF "HF") (VNx8HF "HF")
+ (V2SF "SF") (V4SF "SF") (VNx4SF "SF")
+ (DF "DF") (V2DF "DF") (VNx2DF "DF")
+ (SI "SI") (HI "HI")
(QI "QI")])
;; Define element mode for each vector mode (lower case).
-(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
- (V4HI "hi") (V8HI "hi")
- (V2SI "si") (V4SI "si")
- (DI "di") (V2DI "di")
- (V4HF "hf") (V8HF "hf")
- (V2SF "sf") (V4SF "sf")
- (V2DF "df") (DF "df")
+(define_mode_attr Vel [(V8QI "qi") (V16QI "qi") (VNx16QI "qi")
+ (V4HI "hi") (V8HI "hi") (VNx8HI "hi")
+ (V2SI "si") (V4SI "si") (VNx4SI "si")
+ (DI "di") (V2DI "di") (VNx2DI "di")
+ (V4HF "hf") (V8HF "hf") (VNx8HF "hf")
+ (V2SF "sf") (V4SF "sf") (VNx4SF "sf")
+ (V2DF "df") (DF "df") (VNx2DF "df")
(SI "si") (HI "hi")
(QI "qi")])
+;; Element mode with floating-point values replaced by like-sized integers.
+(define_mode_attr VEL_INT [(VNx16QI "QI")
+ (VNx8HI "HI") (VNx8HF "HI")
+ (VNx4SI "SI") (VNx4SF "SI")
+ (VNx2DI "DI") (VNx2DF "DI")])
+
+;; Gives the mode of the 128-bit lowpart of an SVE vector.
+(define_mode_attr V128 [(VNx16QI "V16QI")
+ (VNx8HI "V8HI") (VNx8HF "V8HF")
+ (VNx4SI "V4SI") (VNx4SF "V4SF")
+ (VNx2DI "V2DI") (VNx2DF "V2DF")])
+
+;; ...and again in lower case.
+(define_mode_attr v128 [(VNx16QI "v16qi")
+ (VNx8HI "v8hi") (VNx8HF "v8hf")
+ (VNx4SI "v4si") (VNx4SF "v4sf")
+ (VNx2DI "v2di") (VNx2DF "v2df")])
+
;; 64-bit container modes the inner or scalar source mode.
(define_mode_attr VCOND [(HI "V4HI") (SI "V2SI")
(V4HI "V4HI") (V8HI "V4HI")
@@ -666,16 +746,28 @@
(V2DI "4s")])
;; Widened modes of vector modes.
-(define_mode_attr VWIDE [(V8QI "V8HI") (V4HI "V4SI")
- (V2SI "V2DI") (V16QI "V8HI")
- (V8HI "V4SI") (V4SI "V2DI")
- (HI "SI") (SI "DI")
- (V8HF "V4SF") (V4SF "V2DF")
- (V4HF "V4SF") (V2SF "V2DF")]
-)
+(define_mode_attr VWIDE [(V8QI "V8HI") (V4HI "V4SI")
+ (V2SI "V2DI") (V16QI "V8HI")
+ (V8HI "V4SI") (V4SI "V2DI")
+ (HI "SI") (SI "DI")
+ (V8HF "V4SF") (V4SF "V2DF")
+ (V4HF "V4SF") (V2SF "V2DF")
+ (VNx8HF "VNx4SF") (VNx4SF "VNx2DF")
+ (VNx16QI "VNx8HI") (VNx8HI "VNx4SI")
+ (VNx4SI "VNx2DI")
+ (VNx16BI "VNx8BI") (VNx8BI "VNx4BI")
+ (VNx4BI "VNx2BI")])
+
+;; Predicate mode associated with VWIDE.
+(define_mode_attr VWIDE_PRED [(VNx8HF "VNx4BI") (VNx4SF "VNx2BI")])
;; Widened modes of vector modes, lowercase
-(define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf")])
+(define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf")
+ (VNx16QI "vnx8hi") (VNx8HI "vnx4si")
+ (VNx4SI "vnx2di")
+ (VNx8HF "vnx4sf") (VNx4SF "vnx2df")
+ (VNx16BI "vnx8bi") (VNx8BI "vnx4bi")
+ (VNx4BI "vnx2bi")])
;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF.
(define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s")
@@ -683,6 +775,11 @@
(V8HI "4s") (V4SI "2d")
(V8HF "4s") (V4SF "2d")])
+;; SVE vector after widening
+(define_mode_attr Vewtype [(VNx16QI "h")
+ (VNx8HI "s") (VNx8HF "s")
+ (VNx4SI "d") (VNx4SF "d")])
+
;; Widened mode register suffixes for VDW/VQW.
(define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s")
(V2SI ".2d") (V16QI ".8h")
@@ -696,22 +793,23 @@
(V4SF "2s")])
;; Define corresponding core/FP element mode for each vector mode.
-(define_mode_attr vw [(V8QI "w") (V16QI "w")
- (V4HI "w") (V8HI "w")
- (V2SI "w") (V4SI "w")
- (DI "x") (V2DI "x")
- (V2SF "s") (V4SF "s")
- (V2DF "d")])
+(define_mode_attr vw [(V8QI "w") (V16QI "w") (VNx16QI "w")
+ (V4HI "w") (V8HI "w") (VNx8HI "w")
+ (V2SI "w") (V4SI "w") (VNx4SI "w")
+ (DI "x") (V2DI "x") (VNx2DI "x")
+ (VNx8HF "h")
+ (V2SF "s") (V4SF "s") (VNx4SF "s")
+ (V2DF "d") (VNx2DF "d")])
;; Corresponding core element mode for each vector mode. This is a
;; variation on <vw> mapping FP modes to GP regs.
-(define_mode_attr vwcore [(V8QI "w") (V16QI "w")
- (V4HI "w") (V8HI "w")
- (V2SI "w") (V4SI "w")
- (DI "x") (V2DI "x")
- (V4HF "w") (V8HF "w")
- (V2SF "w") (V4SF "w")
- (V2DF "x")])
+(define_mode_attr vwcore [(V8QI "w") (V16QI "w") (VNx16QI "w")
+ (V4HI "w") (V8HI "w") (VNx8HI "w")
+ (V2SI "w") (V4SI "w") (VNx4SI "w")
+ (DI "x") (V2DI "x") (VNx2DI "x")
+ (V4HF "w") (V8HF "w") (VNx8HF "w")
+ (V2SF "w") (V4SF "w") (VNx4SF "w")
+ (V2DF "x") (VNx2DF "x")])
;; Double vector types for ALLX.
(define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")])
@@ -723,8 +821,13 @@
(DI "DI") (V2DI "V2DI")
(V4HF "V4HI") (V8HF "V8HI")
(V2SF "V2SI") (V4SF "V4SI")
- (V2DF "V2DI") (DF "DI")
- (SF "SI") (HF "HI")])
+ (DF "DI") (V2DF "V2DI")
+ (SF "SI") (HF "HI")
+ (VNx16QI "VNx16QI")
+ (VNx8HI "VNx8HI") (VNx8HF "VNx8HI")
+ (VNx4SI "VNx4SI") (VNx4SF "VNx4SI")
+ (VNx2DI "VNx2DI") (VNx2DF "VNx2DI")
+])
;; Lower case mode with floating-point values replaced by like-sized integers.
(define_mode_attr v_int_equiv [(V8QI "v8qi") (V16QI "v16qi")
@@ -733,8 +836,19 @@
(DI "di") (V2DI "v2di")
(V4HF "v4hi") (V8HF "v8hi")
(V2SF "v2si") (V4SF "v4si")
- (V2DF "v2di") (DF "di")
- (SF "si")])
+ (DF "di") (V2DF "v2di")
+ (SF "si")
+ (VNx16QI "vnx16qi")
+ (VNx8HI "vnx8hi") (VNx8HF "vnx8hi")
+ (VNx4SI "vnx4si") (VNx4SF "vnx4si")
+ (VNx2DI "vnx2di") (VNx2DF "vnx2di")
+])
+
+;; Floating-point equivalent of selected modes.
+(define_mode_attr V_FP_EQUIV [(VNx4SI "VNx4SF") (VNx4SF "VNx4SF")
+ (VNx2DI "VNx2DF") (VNx2DF "VNx2DF")])
+(define_mode_attr v_fp_equiv [(VNx4SI "vnx4sf") (VNx4SF "vnx4sf")
+ (VNx2DI "vnx2df") (VNx2DF "vnx2df")])
;; Mode for vector conditional operations where the comparison has
;; different type from the lhs.
@@ -869,6 +983,18 @@
(define_code_attr f16mac [(plus "a") (minus "s")])
+;; The predicate mode associated with an SVE data mode.
+(define_mode_attr VPRED [(VNx16QI "VNx16BI")
+ (VNx8HI "VNx8BI") (VNx8HF "VNx8BI")
+ (VNx4SI "VNx4BI") (VNx4SF "VNx4BI")
+ (VNx2DI "VNx2BI") (VNx2DF "VNx2BI")])
+
+;; ...and again in lower case.
+(define_mode_attr vpred [(VNx16QI "vnx16bi")
+ (VNx8HI "vnx8bi") (VNx8HF "vnx8bi")
+ (VNx4SI "vnx4bi") (VNx4SF "vnx4bi")
+ (VNx2DI "vnx2bi") (VNx2DF "vnx2bi")])
+
;; -------------------------------------------------------------------
;; Code Iterators
;; -------------------------------------------------------------------
@@ -882,6 +1008,9 @@
;; Code iterator for logical operations
(define_code_iterator LOGICAL [and ior xor])
+;; LOGICAL without AND.
+(define_code_iterator LOGICAL_OR [ior xor])
+
;; Code iterator for logical operations whose :nlogical works on SIMD registers.
(define_code_iterator NLOGICAL [and ior])
@@ -940,6 +1069,12 @@
;; Unsigned comparison operators.
(define_code_iterator FAC_COMPARISONS [lt le ge gt])
+;; SVE integer unary operations.
+(define_code_iterator SVE_INT_UNARY [neg not popcount])
+
+;; SVE floating-point unary operations.
+(define_code_iterator SVE_FP_UNARY [neg abs sqrt])
+
;; -------------------------------------------------------------------
;; Code Attributes
;; -------------------------------------------------------------------
@@ -956,6 +1091,7 @@
(unsigned_fix "fixuns")
(float "float")
(unsigned_float "floatuns")
+ (popcount "popcount")
(and "and")
(ior "ior")
(xor "xor")
@@ -969,6 +1105,10 @@
(us_minus "qsub")
(ss_neg "qneg")
(ss_abs "qabs")
+ (smin "smin")
+ (smax "smax")
+ (umin "umin")
+ (umax "umax")
(eq "eq")
(ne "ne")
(lt "lt")
@@ -978,7 +1118,9 @@
(ltu "ltu")
(leu "leu")
(geu "geu")
- (gtu "gtu")])
+ (gtu "gtu")
+ (abs "abs")
+ (sqrt "sqrt")])
;; For comparison operators we use the FCM* and CM* instructions.
;; As there are no CMLE or CMLT instructions which act on 3 vector
@@ -1021,9 +1163,12 @@
;; Operation names for negate and bitwise complement.
(define_code_attr neg_not_op [(neg "neg") (not "not")])
-;; Similar, but when not(op)
+;; Similar, but when the second operand is inverted.
(define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")])
+;; Similar, but when both operands are inverted.
+(define_code_attr logical_nn [(and "nor") (ior "nand")])
+
;; Sign- or zero-extending data-op
(define_code_attr su [(sign_extend "s") (zero_extend "u")
(sign_extract "s") (zero_extract "u")
@@ -1032,6 +1177,9 @@
(smax "s") (umax "u")
(smin "s") (umin "u")])
+;; Whether a shift is left or right.
+(define_code_attr lr [(ashift "l") (ashiftrt "r") (lshiftrt "r")])
+
;; Emit conditional branch instructions.
(define_code_attr bcond [(eq "beq") (ne "bne") (lt "bne") (ge "beq")])
@@ -1077,6 +1225,25 @@
;; Attribute to describe constants acceptable in atomic logical operations
(define_mode_attr lconst_atomic [(QI "K") (HI "K") (SI "K") (DI "L")])
+;; The integer SVE instruction that implements an rtx code.
+(define_code_attr sve_int_op [(plus "add")
+ (neg "neg")
+ (smin "smin")
+ (smax "smax")
+ (umin "umin")
+ (umax "umax")
+ (and "and")
+ (ior "orr")
+ (xor "eor")
+ (not "not")
+ (popcount "cnt")])
+
+;; The floating-point SVE instruction that implements an rtx code.
+(define_code_attr sve_fp_op [(plus "fadd")
+ (neg "fneg")
+ (abs "fabs")
+ (sqrt "fsqrt")])
+
;; -------------------------------------------------------------------
;; Int Iterators.
;; -------------------------------------------------------------------
@@ -1086,6 +1253,8 @@
(define_int_iterator FMAXMINV [UNSPEC_FMAXV UNSPEC_FMINV
UNSPEC_FMAXNMV UNSPEC_FMINNMV])
+(define_int_iterator LOGICALF [UNSPEC_ANDF UNSPEC_IORF UNSPEC_XORF])
+
(define_int_iterator HADDSUB [UNSPEC_SHADD UNSPEC_UHADD
UNSPEC_SRHADD UNSPEC_URHADD
UNSPEC_SHSUB UNSPEC_UHSUB
@@ -1141,6 +1310,9 @@
UNSPEC_TRN1 UNSPEC_TRN2
UNSPEC_UZP1 UNSPEC_UZP2])
+(define_int_iterator OPTAB_PERMUTE [UNSPEC_ZIP1 UNSPEC_ZIP2
+ UNSPEC_UZP1 UNSPEC_UZP2])
+
(define_int_iterator REVERSE [UNSPEC_REV64 UNSPEC_REV32 UNSPEC_REV16])
(define_int_iterator FRINT [UNSPEC_FRINTZ UNSPEC_FRINTP UNSPEC_FRINTM
@@ -1179,6 +1351,21 @@
(define_int_iterator VFMLA16_HIGH [UNSPEC_FMLAL2 UNSPEC_FMLSL2])
+(define_int_iterator UNPACK [UNSPEC_UNPACKSHI UNSPEC_UNPACKUHI
+ UNSPEC_UNPACKSLO UNSPEC_UNPACKULO])
+
+(define_int_iterator UNPACK_UNSIGNED [UNSPEC_UNPACKULO UNSPEC_UNPACKUHI])
+
+(define_int_iterator SVE_COND_INT_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
+ UNSPEC_COND_EQ UNSPEC_COND_NE
+ UNSPEC_COND_GE UNSPEC_COND_GT
+ UNSPEC_COND_LO UNSPEC_COND_LS
+ UNSPEC_COND_HS UNSPEC_COND_HI])
+
+(define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
+ UNSPEC_COND_EQ UNSPEC_COND_NE
+ UNSPEC_COND_GE UNSPEC_COND_GT])
+
;; Iterators for atomic operations.
(define_int_iterator ATOMIC_LDOP
@@ -1192,6 +1379,14 @@
;; -------------------------------------------------------------------
;; Int Iterators Attributes.
;; -------------------------------------------------------------------
+
+;; The optab associated with an operation. Note that for ANDF, IORF
+;; and XORF, the optab pattern is not actually defined; we just use this
+;; name for consistency with the integer patterns.
+(define_int_attr optab [(UNSPEC_ANDF "and")
+ (UNSPEC_IORF "ior")
+ (UNSPEC_XORF "xor")])
+
(define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax")
(UNSPEC_UMINV "umin")
(UNSPEC_SMAXV "smax")
@@ -1218,6 +1413,17 @@
(UNSPEC_FMAXNM "fmaxnm")
(UNSPEC_FMINNM "fminnm")])
+;; The SVE logical instruction that implements an unspec.
+(define_int_attr logicalf_op [(UNSPEC_ANDF "and")
+ (UNSPEC_IORF "orr")
+ (UNSPEC_XORF "eor")])
+
+;; "s" for signed operations and "u" for unsigned ones.
+(define_int_attr su [(UNSPEC_UNPACKSHI "s")
+ (UNSPEC_UNPACKUHI "u")
+ (UNSPEC_UNPACKSLO "s")
+ (UNSPEC_UNPACKULO "u")])
+
(define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
(UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur")
(UNSPEC_SHSUB "s") (UNSPEC_UHSUB "u")
@@ -1328,7 +1534,9 @@
(define_int_attr perm_hilo [(UNSPEC_ZIP1 "1") (UNSPEC_ZIP2 "2")
(UNSPEC_TRN1 "1") (UNSPEC_TRN2 "2")
- (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")])
+ (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")
+ (UNSPEC_UNPACKSHI "hi") (UNSPEC_UNPACKUHI "hi")
+ (UNSPEC_UNPACKSLO "lo") (UNSPEC_UNPACKULO "lo")])
(define_int_attr frecp_suffix [(UNSPEC_FRECPE "e") (UNSPEC_FRECPX "x")])
@@ -1361,3 +1569,27 @@
(define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
(UNSPEC_FMLAL2 "a") (UNSPEC_FMLSL2 "s")])
+
+;; The condition associated with an UNSPEC_COND_<xx>.
+(define_int_attr cmp_op [(UNSPEC_COND_LT "lt")
+ (UNSPEC_COND_LE "le")
+ (UNSPEC_COND_EQ "eq")
+ (UNSPEC_COND_NE "ne")
+ (UNSPEC_COND_GE "ge")
+ (UNSPEC_COND_GT "gt")
+ (UNSPEC_COND_LO "lo")
+ (UNSPEC_COND_LS "ls")
+ (UNSPEC_COND_HS "hs")
+ (UNSPEC_COND_HI "hi")])
+
+;; The constraint to use for an UNSPEC_COND_<xx>.
+(define_int_attr imm_con [(UNSPEC_COND_EQ "vsc")
+ (UNSPEC_COND_NE "vsc")
+ (UNSPEC_COND_LT "vsc")
+ (UNSPEC_COND_GE "vsc")
+ (UNSPEC_COND_LE "vsc")
+ (UNSPEC_COND_GT "vsc")
+ (UNSPEC_COND_LO "vsd")
+ (UNSPEC_COND_LS "vsd")
+ (UNSPEC_COND_HS "vsd")
+ (UNSPEC_COND_HI "vsd")])
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 65b2df6..7424f50 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -93,6 +93,10 @@
(define_predicate "aarch64_fp_vec_pow2"
(match_test "aarch64_vec_fpconst_pow_of_2 (op) > 0"))
+(define_predicate "aarch64_sve_cnt_immediate"
+ (and (match_code "const_poly_int")
+ (match_test "aarch64_sve_cnt_immediate_p (op)")))
+
(define_predicate "aarch64_sub_immediate"
(and (match_code "const_int")
(match_test "aarch64_uimm12_shift (-INTVAL (op))")))
@@ -114,9 +118,22 @@
(and (match_operand 0 "aarch64_pluslong_immediate")
(not (match_operand 0 "aarch64_plus_immediate"))))
+(define_predicate "aarch64_sve_addvl_addpl_immediate"
+ (and (match_code "const_poly_int")
+ (match_test "aarch64_sve_addvl_addpl_immediate_p (op)")))
+
+(define_predicate "aarch64_split_add_offset_immediate"
+ (and (match_code "const_poly_int")
+ (match_test "aarch64_add_offset_temporaries (op) == 1")))
+
(define_predicate "aarch64_pluslong_operand"
(ior (match_operand 0 "register_operand")
- (match_operand 0 "aarch64_pluslong_immediate")))
+ (match_operand 0 "aarch64_pluslong_immediate")
+ (match_operand 0 "aarch64_sve_addvl_addpl_immediate")))
+
+(define_predicate "aarch64_pluslong_or_poly_operand"
+ (ior (match_operand 0 "aarch64_pluslong_operand")
+ (match_operand 0 "aarch64_split_add_offset_immediate")))
(define_predicate "aarch64_logical_immediate"
(and (match_code "const_int")
@@ -263,11 +280,18 @@
})
(define_predicate "aarch64_mov_operand"
- (and (match_code "reg,subreg,mem,const,const_int,symbol_ref,label_ref,high")
+ (and (match_code "reg,subreg,mem,const,const_int,symbol_ref,label_ref,high,
+ const_poly_int,const_vector")
(ior (match_operand 0 "register_operand")
(ior (match_operand 0 "memory_operand")
(match_test "aarch64_mov_operand_p (op, mode)")))))
+(define_predicate "aarch64_nonmemory_operand"
+ (and (match_code "reg,subreg,const,const_int,symbol_ref,label_ref,high,
+ const_poly_int,const_vector")
+ (ior (match_operand 0 "register_operand")
+ (match_test "aarch64_mov_operand_p (op, mode)"))))
+
(define_predicate "aarch64_movti_operand"
(and (match_code "reg,subreg,mem,const_int")
(ior (match_operand 0 "register_operand")
@@ -303,6 +327,9 @@
return aarch64_get_condition_code (op) >= 0;
})
+(define_special_predicate "aarch64_equality_operator"
+ (match_code "eq,ne"))
+
(define_special_predicate "aarch64_carry_operation"
(match_code "ne,geu")
{
@@ -342,22 +369,34 @@
})
(define_special_predicate "aarch64_simd_lshift_imm"
- (match_code "const_vector")
+ (match_code "const,const_vector")
{
return aarch64_simd_shift_imm_p (op, mode, true);
})
(define_special_predicate "aarch64_simd_rshift_imm"
- (match_code "const_vector")
+ (match_code "const,const_vector")
{
return aarch64_simd_shift_imm_p (op, mode, false);
})
+(define_predicate "aarch64_simd_imm_zero"
+ (and (match_code "const,const_vector")
+ (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_predicate "aarch64_simd_or_scalar_imm_zero"
+ (and (match_code "const_int,const_double,const,const_vector")
+ (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_predicate "aarch64_simd_imm_minus_one"
+ (and (match_code "const,const_vector")
+ (match_test "op == CONSTM1_RTX (GET_MODE (op))")))
+
(define_predicate "aarch64_simd_reg_or_zero"
- (and (match_code "reg,subreg,const_int,const_double,const_vector")
+ (and (match_code "reg,subreg,const_int,const_double,const,const_vector")
(ior (match_operand 0 "register_operand")
- (ior (match_test "op == const0_rtx")
- (match_test "aarch64_simd_imm_zero_p (op, mode)")))))
+ (match_test "op == const0_rtx")
+ (match_operand 0 "aarch64_simd_imm_zero"))))
(define_predicate "aarch64_simd_struct_operand"
(and (match_code "mem")
@@ -377,21 +416,6 @@
|| GET_CODE (XEXP (op, 0)) == POST_INC
|| GET_CODE (XEXP (op, 0)) == REG")))
-(define_special_predicate "aarch64_simd_imm_zero"
- (match_code "const_vector")
-{
- return aarch64_simd_imm_zero_p (op, mode);
-})
-
-(define_special_predicate "aarch64_simd_or_scalar_imm_zero"
- (match_test "aarch64_simd_imm_zero_p (op, mode)"))
-
-(define_special_predicate "aarch64_simd_imm_minus_one"
- (match_code "const_vector")
-{
- return aarch64_const_vec_all_same_int_p (op, -1);
-})
-
;; Predicates used by the various SIMD shift operations. These
;; fall in to 3 categories.
;; Shifts with a range 0-(bit_size - 1) (aarch64_simd_shift_imm)
@@ -448,3 +472,133 @@
(define_predicate "aarch64_constant_pool_symref"
(and (match_code "symbol_ref")
(match_test "CONSTANT_POOL_ADDRESS_P (op)")))
+
+(define_predicate "aarch64_constant_vector_operand"
+ (match_code "const,const_vector"))
+
+(define_predicate "aarch64_sve_ld1r_operand"
+ (and (match_operand 0 "memory_operand")
+ (match_test "aarch64_sve_ld1r_operand_p (op)")))
+
+;; Like memory_operand, but restricted to addresses that are valid for
+;; SVE LDR and STR instructions.
+(define_predicate "aarch64_sve_ldr_operand"
+ (and (match_code "mem")
+ (match_test "aarch64_sve_ldr_operand_p (op)")))
+
+(define_predicate "aarch64_sve_nonimmediate_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_ldr_operand")))
+
+(define_predicate "aarch64_sve_general_operand"
+ (and (match_code "reg,subreg,mem,const,const_vector")
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_ldr_operand")
+ (match_test "aarch64_mov_operand_p (op, mode)"))))
+
+;; Doesn't include immediates, since those are handled by the move
+;; patterns instead.
+(define_predicate "aarch64_sve_dup_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_ld1r_operand")))
+
+(define_predicate "aarch64_sve_arith_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_arith_immediate_p (op, false)")))
+
+(define_predicate "aarch64_sve_sub_arith_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_arith_immediate_p (op, true)")))
+
+(define_predicate "aarch64_sve_inc_dec_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_inc_dec_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_logical_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_bitmask_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_mul_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_const_vec_all_same_in_range_p (op, -128, 127)")))
+
+(define_predicate "aarch64_sve_dup_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_dup_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_cmp_vsc_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_cmp_immediate_p (op, true)")))
+
+(define_predicate "aarch64_sve_cmp_vsd_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_cmp_immediate_p (op, false)")))
+
+(define_predicate "aarch64_sve_index_immediate"
+ (and (match_code "const_int")
+ (match_test "aarch64_sve_index_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_float_arith_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_float_arith_immediate_p (op, false)")))
+
+(define_predicate "aarch64_sve_float_arith_with_sub_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_float_arith_immediate_p (op, true)")))
+
+(define_predicate "aarch64_sve_float_mul_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_float_mul_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_arith_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_arith_immediate")))
+
+(define_predicate "aarch64_sve_add_operand"
+ (ior (match_operand 0 "aarch64_sve_arith_operand")
+ (match_operand 0 "aarch64_sve_sub_arith_immediate")
+ (match_operand 0 "aarch64_sve_inc_dec_immediate")))
+
+(define_predicate "aarch64_sve_logical_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_logical_immediate")))
+
+(define_predicate "aarch64_sve_lshift_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_simd_lshift_imm")))
+
+(define_predicate "aarch64_sve_rshift_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_simd_rshift_imm")))
+
+(define_predicate "aarch64_sve_mul_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_mul_immediate")))
+
+(define_predicate "aarch64_sve_cmp_vsc_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_cmp_vsc_immediate")))
+
+(define_predicate "aarch64_sve_cmp_vsd_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_cmp_vsd_immediate")))
+
+(define_predicate "aarch64_sve_index_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_index_immediate")))
+
+(define_predicate "aarch64_sve_float_arith_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_float_arith_immediate")))
+
+(define_predicate "aarch64_sve_float_arith_with_sub_operand"
+ (ior (match_operand 0 "aarch64_sve_float_arith_operand")
+ (match_operand 0 "aarch64_sve_float_arith_with_sub_immediate")))
+
+(define_predicate "aarch64_sve_float_mul_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_float_mul_immediate")))
+
+(define_predicate "aarch64_sve_vec_perm_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_constant_vector_operand")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 89a4727..28c61a0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14594,6 +14594,23 @@ Permissible values are @samp{none}, which disables return address signing,
functions, and @samp{all}, which enables pointer signing for all functions. The
default value is @samp{none}.
+@item -msve-vector-bits=@var{bits}
+@opindex msve-vector-bits
+Specify the number of bits in an SVE vector register. This option only has
+an effect when SVE is enabled.
+
+GCC supports two forms of SVE code generation: ``vector-length
+agnostic'' output that works with any size of vector register and
+``vector-length specific'' output that only works when the vector
+registers are a particular size. Replacing @var{bits} with
+@samp{scalable} selects vector-length agnostic output while
+replacing it with a number selects vector-length specific output.
+The possible lengths in the latter case are: 128, 256, 512, 1024
+and 2048. @samp{scalable} is the default.
+
+At present, @samp{-msve-vector-bits=128} produces the same output
+as @samp{-msve-vector-bits=scalable}.
+
@end table
@subsubsection @option{-march} and @option{-mcpu} Feature Modifiers
@@ -14617,6 +14634,9 @@ values for options @option{-march} and @option{-mcpu}.
Enable Advanced SIMD instructions. This also enables floating-point
instructions. This is on by default for all possible values for options
@option{-march} and @option{-mcpu}.
+@item sve
+Enable Scalable Vector Extension instructions. This also enables Advanced
+SIMD and floating-point instructions.
@item lse
Enable Large System Extension instructions. This is on by default for
@option{-march=armv8.1-a}.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 497df1b..e956c75 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -1735,7 +1735,13 @@ the meanings of that architecture's constraints.
The stack pointer register (@code{SP})
@item w
-Floating point or SIMD vector register
+Floating point register, Advanced SIMD vector register or SVE vector register
+
+@item Upl
+One of the low eight SVE predicate registers (@code{P0} to @code{P7})
+
+@item Upa
+Any of the SVE predicate registers (@code{P0} to @code{P15})
@item I
Integer constant that is valid as an immediate operand in an @code{ADD}