aboutsummaryrefslogtreecommitdiff
path: root/gcc/config/aarch64
diff options
context:
space:
mode:
authorRichard Sandiford <richard.sandiford@arm.com>2019-10-29 09:08:47 +0000
committerRichard Sandiford <rsandifo@gcc.gnu.org>2019-10-29 09:08:47 +0000
commitc600df9a4060da3c6121ff4d0b93f179eafd69d1 (patch)
tree3ab057ceb11b2b41ed5b4de9fc9e08b33da249fb /gcc/config/aarch64
parent624d0f07d51b7fa8bc99142bd0e8380fb9e7badc (diff)
downloadgcc-c600df9a4060da3c6121ff4d0b93f179eafd69d1.zip
gcc-c600df9a4060da3c6121ff4d0b93f179eafd69d1.tar.gz
gcc-c600df9a4060da3c6121ff4d0b93f179eafd69d1.tar.bz2
[AArch64] Add support for the SVE PCS
The AAPCS64 specifies that if a function takes arguments in SVE registers or returns them in SVE registers, it must preserve all of Z8-Z23 and all of P4-P11. (Normal functions only preserve the low 64 bits of Z8-Z15 and clobber all of the predicate registers.) This variation is known informally as the "SVE PCS" and functions that use it are known informally as "SVE functions". The SVE PCS is mutually interoperable with functions that follow the standard AAPCS64 rules and those that use the aarch64_vector_pcs attribute. (Note that it's an error to use the attribute for SVE functions.) One complication -- although it's not really that complicated -- is that SVE registers need to be saved at a VL-dependent offset while other registers need to be saved at a constant offset. The easiest way of handling this seemed to be to group the SVE registers together below the hard frame pointer. In common cases, the frame pointer is then usually an easy-to-compute VL multiple above the stack pointer and a constant amount below the incoming stack pointer. A bigger complication is that, because the base AAPCS64 specifies that only the low 64 bits of V8-V15 are preserved by calls, the associated DWARF frame registers are also treated as 64 bits by the unwinder. The 64 bits must also have the same layout as they would for a base AAPCS64 function, otherwise unwinding won't work correctly. (This is actually a problem for the existing aarch64_vector_pcs support too, but I'll fix that separately.) This falls out naturally for little-endian targets but not for big-endian targets. The easiest way of meeting the requirement for them was to use ST1D and LD1D to save and restore Z8-Z15, which also has the nice property of storing the 64 bits at the start of the slot. However, using ST1D and LD1D requires a spare predicate register, and since all of P0-P7 are either argument registers or call-preserved, we may need to spill P4 in order to save the vector registers, even if P4 wouldn't need to be saved otherwise. Since Z16-Z23 are fully clobbered by base AAPCS64 functions, we don't need to emit frame information for them at all. This avoids having to decide whether the registers should be treated as having 64 bits (as for Z8-Z15), 128 bits (for Advanced SIMD) or the full SVE width. There are two ways of dealing with stack-clash protection when saving SVE registers: (1) If the area between the hard frame pointer and the incoming stack pointer is allocated via a store with writeback (callee_adjust != 0), the SVE save area is allocated separately and becomes the "initial" allocation as far as stack-clash protection goes. In this case the store with writeback acts as a probe at the hard frame pointer position. (2) If the area between the hard frame pointer and the incoming stack pointer is allocated via aarch64_allocate_and_probe_stack_space, the SVE save area is added to this initial allocation, so that the SP ends up pointing at the SVE register saves. It's then necessary to use a temporary base register to save the non-SVE registers. Setting up this temporary register requires a single instruction only and so should be more efficient than doing two allocations and probes. When SVE registers need to be saved, saving them below the frame pointer makes it harder to rely on the LR save as a stack probe, since the LR register's offset won't usually be a compile-time constant. The patch copes with that by using the lowest SVE register save as a stack probe too, and thus prevents the save from being shrink-wrapped if stack clash protection is enabled. The changelog describes the low-level details. 2019-10-29 Richard Sandiford <richard.sandiford@arm.com> gcc/ * calls.c (pass_by_reference): Leave the target to decide whether POLY_INT_CST-sized arguments should be passed by value or reference, rather than forcing them to be passed by reference. (must_pass_in_stack_var_size): Likewise. * config/aarch64/aarch64.md (LAST_SAVED_REGNUM): Redefine from V31_REGNUM to P15_REGNUM. * config/aarch64/aarch64-protos.h (aarch64_init_cumulative_args): Take an extra "silent_p" parameter, defaulting to false. (aarch64_sve::svbool_type_p): Declare. (aarch64_sve::nvectors_if_data_type): Likewise. * config/aarch64/aarch64.h (NUM_PR_ARG_REGS): New macro. (aarch64_frame::reg_offset): Turn into poly_int64s. (aarch64_frame::save_regs_size): Likewise. (aarch64_frame::below_hard_fp_saved_regs_size): New field. (aarch64_frame::sve_callee_adjust): Likewise. (aarch64_frame::spare_reg_reg): Likewise. (ARM_PCS_SVE): New arm_pcs value. (CUMULATIVE_ARGS::aapcs_nprn): New field. (CUMULATIVE_ARGS::aapcs_nextnprn): Likewise. (CUMULATIVE_ARGS::silent_p): Likewise. (BITS_PER_SVE_PRED): New macro. * config/aarch64/aarch64.c (handle_aarch64_vector_pcs_attribute): New function. Reject aarch64_vector_pcs attributes on SVE functions. (aarch64_attribute_table): Use the above handler. (aarch64_sve_abi): New function. (aarch64_sve_argument_p): Likewise. (aarch64_returns_value_in_sve_regs_p): Likewise. (aarch64_takes_arguments_in_sve_regs_p): Likewise. (aarch64_fntype_abi): Check for SVE functions and return the SVE PCS descriptor for them. (aarch64_simd_decl_p): Delete. (aarch64_emit_cfi_for_reg_p): New function. (aarch64_reg_save_mode): Remove the fndecl argument and instead use crtl->abi to choose the mode for FP registers. Handle the SVE PCS. (aarch64_hard_regno_call_part_clobbered): Do not treat FP registers as partly clobbered for the SVE PCS. (aarch64_function_ok_for_sibcall): Check whether the two functions use the same ABI, rather than checking specifically for whether they're aarch64_vector_pcs functions. (aarch64_pass_by_reference): Raise an error for attempts to pass SVE arguments when SVE is disabled. Pass SVE arguments by reference if there are not enough free registers left, or if the argument is variadic. (aarch64_function_value): Handle SVE predicates, vectors and tuples. (aarch64_return_in_memory): Do not return SVE predicates, vectors and tuples in memory. (aarch64_layout_arg): Take a function_arg_info rather than individual properties. Handle SVE predicates, vectors and tuples. Raise an error if they are passed to unprototyped functions. (aarch64_function_arg): If the silent_p flag is set, suppress the usual error about using float registers without TARGET_FLOAT. (aarch64_init_cumulative_args): Take a silent_p parameter and store it in the cumulative_args structure. Initialize aapcs_nprn and aapcs_nextnprn. If the silent_p flag is set, suppress the usual error about using float registers without TARGET_FLOAT. If the silent_p flag is not set, also raise an error about using SVE functions when SVE is disabled. (aarch64_function_arg_advance): Update the call to aarch64_layout_arg, and call it for SVE functions too. Update aapcs_nprn similarly to the other register counts. (aarch64_layout_frame): If a big-endian function needs to save and restore Z8-Z15, search for a spare predicate that it can use. Store SVE predicates at the bottom of the register save area, followed by SVE vectors, then followed by the normal slots. Keep pointing the hard frame pointer at the base of the normal slots, above the SVE vectors. Update the various frame creation and tear-down strategies for the new layout, initializing the new sve_callee_adjust field. Add an additional layout for frames whose saved registers are all SVE registers. (aarch64_register_saved_on_entry): Cope with poly_int64 reg_offsets. (aarch64_return_address_signing_enabled): Likewise. (aarch64_push_regs, aarch64_pop_regs): Update calls to aarch64_reg_save_mode. (aarch64_adjust_sve_callee_save_base): New function. (aarch64_add_cfa_expression): Move earlier in file. Take the saved register as an rtx rather than a register number and use its mode for the MEM slot. (aarch64_save_callee_saves): Remove the mode argument and instead use aarch64_reg_save_mode to get the mode of each save slot. Add a hard_fp_valid_p parameter. Cope with poly_int64 register offsets. Allow GP offsets to be saved at a VL-based offset from the stack, handling this case using the frame pointer if available or a temporary register otherwise. Use ST1D to save Z8-Z15 for big-endian SVE functions; use normal moves for other SVE saves. Only mark the save as frame-related if aarch64_emit_cfi_for_reg_p returns true. Add explicit CFA notes when not storing via the stack pointer. Do not try to pair SVE saves. (aarch64_restore_callee_saves): Cope with poly_int64 register offsets. Use LD1D to restore Z8-Z15 for big-endian SVE functions; use normal moves for other SVE restores. Only add CFA restore notes if aarch64_emit_cfi_for_reg_p returns true. Do not try to pair SVE restores. (aarch64_get_separate_components): Always keep the first SVE save in the prologue if we need to use it as a stack probe. Don't allow Z8-Z15 saves and loads to be shrink-wrapped for big-endian targets. Likewise the spare predicate register that they need. Update the offset calculation to account for the SVE save area. Use the appropriate range check for SVE LDR and STR instructions. (aarch64_components_for_bb): Cope with poly_int64 reg_offsets. (aarch64_process_components): Likewise. Update the offset calculation to account for the SVE save area. Only mark the save as frame-related if aarch64_emit_cfi_for_reg_p returns true. Do not try to pair SVE saves. (aarch64_allocate_and_probe_stack_space): Cope with poly_int64 reg_offsets. When handling the final allocation, expect the first SVE register save to be part of the initial allocation and for it to act as a probe at SP. Account for the SVE callee save area in the dump information. (aarch64_expand_prologue): Update the frame diagram. Fold the SVE callee allocation into the initial allocation if stack clash protection is enabled. Use new variables to track the offset of the frame chain (and hard frame pointer) from the current stack pointer, and likewise the offset of the bottom of the register save area. Update calls to aarch64_save_callee_saves and aarch64_add_cfa_expression. Apply sve_callee_adjust before saving the FP&SIMD registers. Save the predicate registers. (aarch64_expand_epilogue): Take below_hard_fp_saved_regs_size into account when setting the stack pointer from the frame pointer, and when deciding whether we can inherit the initial adjustment amount from the prologue. Restore the predicate registers after the vector registers, then apply sve_callee_adjust, then restore the general registers. (aarch64_secondary_reload): Don't use secondary SVE reloads for VNx16BImode. (aapcs_vfp_sub_candidate): Assert that the type is not an SVE type. (aarch64_short_vector_p): Return false for SVE types. (aarch64_vfp_is_call_or_return_candidate): Initialize *is_ha at the start of the function. Return false for SVE types. (aarch64_asm_output_variant_pcs): Output .variant_pcs for SVE functions too. (TARGET_STRICT_ARGUMENT_NAMING): Redefine to request strict naming. * config/aarch64/aarch64-sve.md (*aarch64_sve_mov<mode>_le): Extend to big-endian targets for bytewise moves. (*aarch64_sve_mov<mode>_be): Exclude the bytewise case. gcc/testsuite/ * gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp: New file. * gcc.target/aarch64/sve/pcs/annotate_1.c: New test. * gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise. * gcc.target/aarch64/sve/pcs/annotate_3.c: Likewise. * gcc.target/aarch64/sve/pcs/annotate_4.c: Likewise. * gcc.target/aarch64/sve/pcs/annotate_5.c: Likewise. * gcc.target/aarch64/sve/pcs/annotate_6.c: Likewise. * gcc.target/aarch64/sve/pcs/annotate_7.c: Likewise. * gcc.target/aarch64/sve/pcs/args_1.c: Likewise. * gcc.target/aarch64/sve/pcs/args_10.c: Likewise. * gcc.target/aarch64/sve/pcs/args_11_nosc.c: Likewise. * gcc.target/aarch64/sve/pcs/args_11_sc.c: Likewise. * gcc.target/aarch64/sve/pcs/args_2.c: Likewise. * gcc.target/aarch64/sve/pcs/args_3.c: Likewise. * gcc.target/aarch64/sve/pcs/args_4.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_f16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_u64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_f16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_u64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_7.c: Likewise. * gcc.target/aarch64/sve/pcs/args_8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_9.c: Likewise. * gcc.target/aarch64/sve/pcs/nosve_1.c: Likewise. * gcc.target/aarch64/sve/pcs/nosve_2.c: Likewise. * gcc.target/aarch64/sve/pcs/nosve_3.c: Likewise. * gcc.target/aarch64/sve/pcs/nosve_4.c: Likewise. * gcc.target/aarch64/sve/pcs/nosve_5.c: Likewise. * gcc.target/aarch64/sve/pcs/nosve_6.c: Likewise. * gcc.target/aarch64/sve/pcs/nosve_7.c: Likewise. * gcc.target/aarch64/sve/pcs/nosve_8.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1_2048.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1_256.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1_512.c: Likewise. * gcc.target/aarch64/sve/pcs/return_2.c: Likewise. * gcc.target/aarch64/sve/pcs/return_3.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4_2048.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4_256.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4_512.c: Likewise. * gcc.target/aarch64/sve/pcs/return_5.c: Likewise. * gcc.target/aarch64/sve/pcs/return_5_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/return_5_2048.c: Likewise. * gcc.target/aarch64/sve/pcs/return_5_256.c: Likewise. * gcc.target/aarch64/sve/pcs/return_5_512.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise. * gcc.target/aarch64/sve/pcs/return_7.c: Likewise. * gcc.target/aarch64/sve/pcs/return_8.c: Likewise. * gcc.target/aarch64/sve/pcs/return_9.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_1_be_nowrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_1_be_wrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_1_le_nowrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_1_le_wrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_3.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_5_be.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_5_le.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_1.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_1_256.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_1_512.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_1_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_1_2048.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_3.c: Likewise. * gcc.target/aarch64/sve/pcs/unprototyped_1.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_3_nosc.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_3_sc.c: Likewise. * gcc.target/aarch64/sve/pcs/vpcs_1.c: Likewise. * g++.target/aarch64/sve/catch_7.C: Likewise. From-SVN: r277564
Diffstat (limited to 'gcc/config/aarch64')
-rw-r--r--gcc/config/aarch64/aarch64-protos.h4
-rw-r--r--gcc/config/aarch64/aarch64-sve.md12
-rw-r--r--gcc/config/aarch64/aarch64.c994
-rw-r--r--gcc/config/aarch64/aarch64.h30
-rw-r--r--gcc/config/aarch64/aarch64.md2
5 files changed, 819 insertions, 223 deletions
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 15f288b..1d4f4fd 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -617,7 +617,7 @@ void aarch64_expand_prologue (void);
void aarch64_expand_vector_init (rtx, rtx);
void aarch64_sve_expand_vector_init (rtx, rtx);
void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
- const_tree, unsigned);
+ const_tree, unsigned, bool = false);
void aarch64_init_expanders (void);
void aarch64_init_simd_builtins (void);
void aarch64_emit_call_insn (rtx);
@@ -705,6 +705,8 @@ namespace aarch64_sve {
void handle_arm_sve_h ();
tree builtin_decl (unsigned, bool);
bool builtin_type_p (const_tree);
+ bool svbool_type_p (const_tree);
+ unsigned int nvectors_if_data_type (const_tree);
const char *mangle_builtin_type (const_tree);
tree resolve_overloaded_builtin (location_t, unsigned int,
vec<tree, va_gc> *);
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 57db06a..0cda882 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -586,14 +586,14 @@
}
)
-;; Unpredicated moves (little-endian). Only allow memory operations
-;; during and after RA; before RA we want the predicated load and
-;; store patterns to be used instead.
+;; Unpredicated moves (bytes or little-endian). Only allow memory operations
+;; during and after RA; before RA we want the predicated load and store
+;; patterns to be used instead.
(define_insn "*aarch64_sve_mov<mode>_le"
[(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w, Utr, w, w")
(match_operand:SVE_ALL 1 "aarch64_sve_general_operand" "Utr, w, w, Dn"))]
"TARGET_SVE
- && !BYTES_BIG_ENDIAN
+ && (<MODE>mode == VNx16QImode || !BYTES_BIG_ENDIAN)
&& ((lra_in_progress || reload_completed)
|| (register_operand (operands[0], <MODE>mode)
&& nonmemory_operand (operands[1], <MODE>mode)))"
@@ -604,12 +604,12 @@
* return aarch64_output_sve_mov_immediate (operands[1]);"
)
-;; Unpredicated moves (big-endian). Memory accesses require secondary
+;; Unpredicated moves (non-byte big-endian). Memory accesses require secondary
;; reloads.
(define_insn "*aarch64_sve_mov<mode>_be"
[(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
(match_operand:SVE_ALL 1 "aarch64_nonmemory_operand" "w, Dn"))]
- "TARGET_SVE && BYTES_BIG_ENDIAN"
+ "TARGET_SVE && BYTES_BIG_ENDIAN && <MODE>mode != VNx16QImode"
"@
mov\t%0.d, %1.d
* return aarch64_output_sve_mov_immediate (operands[1]);"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9cafef4..599d07a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1212,12 +1212,41 @@ enum aarch64_key_type aarch64_ra_sign_key = AARCH64_KEY_A;
/* The current tuning set. */
struct tune_params aarch64_tune_params = generic_tunings;
+/* Check whether an 'aarch64_vector_pcs' attribute is valid. */
+
+static tree
+handle_aarch64_vector_pcs_attribute (tree *node, tree name, tree,
+ int, bool *no_add_attrs)
+{
+ /* Since we set fn_type_req to true, the caller should have checked
+ this for us. */
+ gcc_assert (FUNC_OR_METHOD_TYPE_P (*node));
+ switch ((arm_pcs) fntype_abi (*node).id ())
+ {
+ case ARM_PCS_AAPCS64:
+ case ARM_PCS_SIMD:
+ return NULL_TREE;
+
+ case ARM_PCS_SVE:
+ error ("the %qE attribute cannot be applied to an SVE function type",
+ name);
+ *no_add_attrs = true;
+ return NULL_TREE;
+
+ case ARM_PCS_TLSDESC:
+ case ARM_PCS_UNKNOWN:
+ break;
+ }
+ gcc_unreachable ();
+}
+
/* Table of machine attributes. */
static const struct attribute_spec aarch64_attribute_table[] =
{
/* { name, min_len, max_len, decl_req, type_req, fn_type_req,
affects_type_identity, handler, exclude } */
- { "aarch64_vector_pcs", 0, 0, false, true, true, true, NULL, NULL },
+ { "aarch64_vector_pcs", 0, 0, false, true, true, true,
+ handle_aarch64_vector_pcs_attribute, NULL },
{ NULL, 0, 0, false, false, false, false, NULL, NULL }
};
@@ -1384,6 +1413,25 @@ aarch64_simd_abi (void)
return simd_abi;
}
+/* Return the descriptor of the SVE PCS. */
+
+static const predefined_function_abi &
+aarch64_sve_abi (void)
+{
+ predefined_function_abi &sve_abi = function_abis[ARM_PCS_SVE];
+ if (!sve_abi.initialized_p ())
+ {
+ HARD_REG_SET full_reg_clobbers
+ = default_function_abi.full_reg_clobbers ();
+ for (int regno = V8_REGNUM; regno <= V23_REGNUM; ++regno)
+ CLEAR_HARD_REG_BIT (full_reg_clobbers, regno);
+ for (int regno = P4_REGNUM; regno <= P11_REGNUM; ++regno)
+ CLEAR_HARD_REG_BIT (full_reg_clobbers, regno);
+ sve_abi.initialize (ARM_PCS_SVE, full_reg_clobbers);
+ }
+ return sve_abi;
+}
+
/* Generate code to enable conditional branches in functions over 1 MiB. */
const char *
aarch64_gen_far_branch (rtx * operands, int pos_label, const char * dest,
@@ -1878,6 +1926,74 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
return false;
}
+/* Return true if TYPE is a type that should be passed or returned in
+ SVE registers, assuming enough registers are available. When returning
+ true, set *NUM_ZR and *NUM_PR to the number of required Z and P registers
+ respectively. */
+
+static bool
+aarch64_sve_argument_p (const_tree type, unsigned int *num_zr,
+ unsigned int *num_pr)
+{
+ if (aarch64_sve::svbool_type_p (type))
+ {
+ *num_pr = 1;
+ *num_zr = 0;
+ return true;
+ }
+
+ if (unsigned int nvectors = aarch64_sve::nvectors_if_data_type (type))
+ {
+ *num_pr = 0;
+ *num_zr = nvectors;
+ return true;
+ }
+
+ return false;
+}
+
+/* Return true if a function with type FNTYPE returns its value in
+ SVE vector or predicate registers. */
+
+static bool
+aarch64_returns_value_in_sve_regs_p (const_tree fntype)
+{
+ unsigned int num_zr, num_pr;
+ tree return_type = TREE_TYPE (fntype);
+ return (return_type != error_mark_node
+ && aarch64_sve_argument_p (return_type, &num_zr, &num_pr));
+}
+
+/* Return true if a function with type FNTYPE takes arguments in
+ SVE vector or predicate registers. */
+
+static bool
+aarch64_takes_arguments_in_sve_regs_p (const_tree fntype)
+{
+ CUMULATIVE_ARGS args_so_far_v;
+ aarch64_init_cumulative_args (&args_so_far_v, NULL_TREE, NULL_RTX,
+ NULL_TREE, 0, true);
+ cumulative_args_t args_so_far = pack_cumulative_args (&args_so_far_v);
+
+ for (tree chain = TYPE_ARG_TYPES (fntype);
+ chain && chain != void_list_node;
+ chain = TREE_CHAIN (chain))
+ {
+ tree arg_type = TREE_VALUE (chain);
+ if (arg_type == error_mark_node)
+ return false;
+
+ function_arg_info arg (arg_type, /*named=*/true);
+ apply_pass_by_reference_rules (&args_so_far_v, arg);
+ unsigned int num_zr, num_pr;
+ if (aarch64_sve_argument_p (arg.type, &num_zr, &num_pr))
+ return true;
+
+ targetm.calls.function_arg_advance (args_so_far, arg);
+ }
+ return false;
+}
+
/* Implement TARGET_FNTYPE_ABI. */
static const predefined_function_abi &
@@ -1885,40 +2001,65 @@ aarch64_fntype_abi (const_tree fntype)
{
if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
return aarch64_simd_abi ();
+
+ if (aarch64_returns_value_in_sve_regs_p (fntype)
+ || aarch64_takes_arguments_in_sve_regs_p (fntype))
+ return aarch64_sve_abi ();
+
return default_function_abi;
}
-/* Return true if this is a definition of a vectorized simd function. */
+/* Return true if we should emit CFI for register REGNO. */
static bool
-aarch64_simd_decl_p (tree fndecl)
+aarch64_emit_cfi_for_reg_p (unsigned int regno)
{
- tree fntype;
-
- if (fndecl == NULL)
- return false;
- fntype = TREE_TYPE (fndecl);
- if (fntype == NULL)
- return false;
-
- /* Functions with the aarch64_vector_pcs attribute use the simd ABI. */
- if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)) != NULL)
- return true;
-
- return false;
+ return (GP_REGNUM_P (regno)
+ || !default_function_abi.clobbers_full_reg_p (regno));
}
-/* Return the mode a register save/restore should use. DImode for integer
- registers, DFmode for FP registers in non-SIMD functions (they only save
- the bottom half of a 128 bit register), or TFmode for FP registers in
- SIMD functions. */
+/* Return the mode we should use to save and restore register REGNO. */
static machine_mode
-aarch64_reg_save_mode (tree fndecl, unsigned regno)
+aarch64_reg_save_mode (unsigned int regno)
{
- return GP_REGNUM_P (regno)
- ? E_DImode
- : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
+ if (GP_REGNUM_P (regno))
+ return DImode;
+
+ if (FP_REGNUM_P (regno))
+ switch (crtl->abi->id ())
+ {
+ case ARM_PCS_AAPCS64:
+ /* Only the low 64 bits are saved by the base PCS. */
+ return DFmode;
+
+ case ARM_PCS_SIMD:
+ /* The vector PCS saves the low 128 bits (which is the full
+ register on non-SVE targets). */
+ return TFmode;
+
+ case ARM_PCS_SVE:
+ /* Use vectors of DImode for registers that need frame
+ information, so that the first 64 bytes of the save slot
+ are always the equivalent of what storing D<n> would give. */
+ if (aarch64_emit_cfi_for_reg_p (regno))
+ return VNx2DImode;
+
+ /* Use vectors of bytes otherwise, so that the layout is
+ endian-agnostic, and so that we can use LDR and STR for
+ big-endian targets. */
+ return VNx16QImode;
+
+ case ARM_PCS_TLSDESC:
+ case ARM_PCS_UNKNOWN:
+ break;
+ }
+
+ if (PR_REGNUM_P (regno))
+ /* Save the full predicate register. */
+ return VNx16BImode;
+
+ gcc_unreachable ();
}
/* Implement TARGET_INSN_CALLEE_ABI. */
@@ -1943,7 +2084,7 @@ aarch64_hard_regno_call_part_clobbered (unsigned int abi_id,
unsigned int regno,
machine_mode mode)
{
- if (FP_REGNUM_P (regno))
+ if (FP_REGNUM_P (regno) && abi_id != ARM_PCS_SVE)
{
poly_int64 per_register_size = GET_MODE_SIZE (mode);
unsigned int nregs = hard_regno_nregs (regno, mode);
@@ -4582,10 +4723,9 @@ aarch64_split_sve_subreg_move (rtx dest, rtx ptrue, rtx src)
}
static bool
-aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
- tree exp ATTRIBUTE_UNUSED)
+aarch64_function_ok_for_sibcall (tree, tree exp)
{
- if (aarch64_simd_decl_p (cfun->decl) != aarch64_simd_decl_p (decl))
+ if (crtl->abi->id () != expr_callee_abi (exp).id ())
return false;
return true;
@@ -4594,12 +4734,30 @@ aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
/* Implement TARGET_PASS_BY_REFERENCE. */
static bool
-aarch64_pass_by_reference (cumulative_args_t, const function_arg_info &arg)
+aarch64_pass_by_reference (cumulative_args_t pcum_v,
+ const function_arg_info &arg)
{
+ CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
HOST_WIDE_INT size;
machine_mode dummymode;
int nregs;
+ unsigned int num_zr, num_pr;
+ if (arg.type && aarch64_sve_argument_p (arg.type, &num_zr, &num_pr))
+ {
+ if (pcum && !pcum->silent_p && !TARGET_SVE)
+ /* We can't gracefully recover at this point, so make this a
+ fatal error. */
+ fatal_error (input_location, "arguments of type %qT require"
+ " the SVE ISA extension", arg.type);
+
+ /* Variadic SVE types are passed by reference. Normal non-variadic
+ arguments are too if we've run out of registers. */
+ return (!arg.named
+ || pcum->aapcs_nvrn + num_zr > NUM_FP_ARG_REGS
+ || pcum->aapcs_nprn + num_pr > NUM_PR_ARG_REGS);
+ }
+
/* GET_MODE_SIZE (BLKmode) is useless since it is 0. */
if (arg.mode == BLKmode && arg.type)
size = int_size_in_bytes (arg.type);
@@ -4673,6 +4831,29 @@ aarch64_function_value (const_tree type, const_tree func,
if (INTEGRAL_TYPE_P (type))
mode = promote_function_mode (type, mode, &unsignedp, func, 1);
+ unsigned int num_zr, num_pr;
+ if (type && aarch64_sve_argument_p (type, &num_zr, &num_pr))
+ {
+ /* Don't raise an error here if we're called when SVE is disabled,
+ since this is really just a query function. Other code must
+ do that where appropriate. */
+ mode = TYPE_MODE_RAW (type);
+ gcc_assert (VECTOR_MODE_P (mode)
+ && (!TARGET_SVE || aarch64_sve_mode_p (mode)));
+
+ if (num_zr > 0 && num_pr == 0)
+ return gen_rtx_REG (mode, V0_REGNUM);
+
+ if (num_zr == 0 && num_pr == 1)
+ return gen_rtx_REG (mode, P0_REGNUM);
+
+ gcc_unreachable ();
+ }
+
+ /* Generic vectors that map to SVE modes with -msve-vector-bits=N are
+ returned in memory, not by value. */
+ gcc_assert (!aarch64_sve_mode_p (mode));
+
if (aarch64_return_in_msb (type))
{
HOST_WIDE_INT size = int_size_in_bytes (type);
@@ -4755,6 +4936,16 @@ aarch64_return_in_memory (const_tree type, const_tree fndecl ATTRIBUTE_UNUSED)
/* Simple scalar types always returned in registers. */
return false;
+ unsigned int num_zr, num_pr;
+ if (type && aarch64_sve_argument_p (type, &num_zr, &num_pr))
+ {
+ /* All SVE types we support fit in registers. For example, it isn't
+ yet possible to define an aggregate of 9+ SVE vectors or 5+ SVE
+ predicates. */
+ gcc_assert (num_zr <= NUM_FP_ARG_REGS && num_pr <= NUM_PR_ARG_REGS);
+ return false;
+ }
+
if (aarch64_vfp_is_call_or_return_candidate (TYPE_MODE (type),
type,
&ag_mode,
@@ -4830,11 +5021,11 @@ aarch64_function_arg_alignment (machine_mode mode, const_tree type,
numbers refer to the rule numbers in the AAPCS64. */
static void
-aarch64_layout_arg (cumulative_args_t pcum_v, machine_mode mode,
- const_tree type,
- bool named ATTRIBUTE_UNUSED)
+aarch64_layout_arg (cumulative_args_t pcum_v, const function_arg_info &arg)
{
CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
+ tree type = arg.type;
+ machine_mode mode = arg.mode;
int ncrn, nvrn, nregs;
bool allocate_ncrn, allocate_nvrn;
HOST_WIDE_INT size;
@@ -4846,6 +5037,46 @@ aarch64_layout_arg (cumulative_args_t pcum_v, machine_mode mode,
pcum->aapcs_arg_processed = true;
+ unsigned int num_zr, num_pr;
+ if (type && aarch64_sve_argument_p (type, &num_zr, &num_pr))
+ {
+ /* The PCS says that it is invalid to pass an SVE value to an
+ unprototyped function. There is no ABI-defined location we
+ can return in this case, so we have no real choice but to raise
+ an error immediately, even though this is only a query function. */
+ if (arg.named && pcum->pcs_variant != ARM_PCS_SVE)
+ {
+ gcc_assert (!pcum->silent_p);
+ error ("SVE type %qT cannot be passed to an unprototyped function",
+ arg.type);
+ /* Avoid repeating the message, and avoid tripping the assert
+ below. */
+ pcum->pcs_variant = ARM_PCS_SVE;
+ }
+
+ /* We would have converted the argument into pass-by-reference
+ form if it didn't fit in registers. */
+ pcum->aapcs_nextnvrn = pcum->aapcs_nvrn + num_zr;
+ pcum->aapcs_nextnprn = pcum->aapcs_nprn + num_pr;
+ gcc_assert (arg.named
+ && pcum->pcs_variant == ARM_PCS_SVE
+ && aarch64_sve_mode_p (mode)
+ && pcum->aapcs_nextnvrn <= NUM_FP_ARG_REGS
+ && pcum->aapcs_nextnprn <= NUM_PR_ARG_REGS);
+
+ if (num_zr > 0 && num_pr == 0)
+ pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + pcum->aapcs_nvrn);
+ else if (num_zr == 0 && num_pr == 1)
+ pcum->aapcs_reg = gen_rtx_REG (mode, P0_REGNUM + pcum->aapcs_nprn);
+ else
+ gcc_unreachable ();
+ return;
+ }
+
+ /* Generic vectors that map to SVE modes with -msve-vector-bits=N are
+ passed by reference, not by value. */
+ gcc_assert (!aarch64_sve_mode_p (mode));
+
/* Size in bytes, rounded to the nearest multiple of 8 bytes. */
if (type)
size = int_size_in_bytes (type);
@@ -4870,7 +5101,7 @@ aarch64_layout_arg (cumulative_args_t pcum_v, machine_mode mode,
and homogenous short-vector aggregates (HVA). */
if (allocate_nvrn)
{
- if (!TARGET_FLOAT)
+ if (!pcum->silent_p && !TARGET_FLOAT)
aarch64_err_no_fpadvsimd (mode);
if (nvrn + nregs <= NUM_FP_ARG_REGS)
@@ -4990,12 +5221,13 @@ aarch64_function_arg (cumulative_args_t pcum_v, const function_arg_info &arg)
{
CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
gcc_assert (pcum->pcs_variant == ARM_PCS_AAPCS64
- || pcum->pcs_variant == ARM_PCS_SIMD);
+ || pcum->pcs_variant == ARM_PCS_SIMD
+ || pcum->pcs_variant == ARM_PCS_SVE);
if (arg.end_marker_p ())
return gen_int_mode (pcum->pcs_variant, DImode);
- aarch64_layout_arg (pcum_v, arg.mode, arg.type, arg.named);
+ aarch64_layout_arg (pcum_v, arg);
return pcum->aapcs_reg;
}
@@ -5004,12 +5236,15 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
const_tree fntype,
rtx libname ATTRIBUTE_UNUSED,
const_tree fndecl ATTRIBUTE_UNUSED,
- unsigned n_named ATTRIBUTE_UNUSED)
+ unsigned n_named ATTRIBUTE_UNUSED,
+ bool silent_p)
{
pcum->aapcs_ncrn = 0;
pcum->aapcs_nvrn = 0;
+ pcum->aapcs_nprn = 0;
pcum->aapcs_nextncrn = 0;
pcum->aapcs_nextnvrn = 0;
+ pcum->aapcs_nextnprn = 0;
if (fntype)
pcum->pcs_variant = (arm_pcs) fntype_abi (fntype).id ();
else
@@ -5018,8 +5253,10 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
pcum->aapcs_arg_processed = false;
pcum->aapcs_stack_words = 0;
pcum->aapcs_stack_size = 0;
+ pcum->silent_p = silent_p;
- if (!TARGET_FLOAT
+ if (!silent_p
+ && !TARGET_FLOAT
&& fndecl && TREE_PUBLIC (fndecl)
&& fntype && fntype != error_mark_node)
{
@@ -5030,7 +5267,20 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
&mode, &nregs, NULL))
aarch64_err_no_fpadvsimd (TYPE_MODE (type));
}
- return;
+
+ if (!silent_p
+ && !TARGET_SVE
+ && pcum->pcs_variant == ARM_PCS_SVE)
+ {
+ /* We can't gracefully recover at this point, so make this a
+ fatal error. */
+ if (fndecl)
+ fatal_error (input_location, "%qE requires the SVE ISA extension",
+ fndecl);
+ else
+ fatal_error (input_location, "calls to functions of type %qT require"
+ " the SVE ISA extension", fntype);
+ }
}
static void
@@ -5039,14 +5289,16 @@ aarch64_function_arg_advance (cumulative_args_t pcum_v,
{
CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
if (pcum->pcs_variant == ARM_PCS_AAPCS64
- || pcum->pcs_variant == ARM_PCS_SIMD)
+ || pcum->pcs_variant == ARM_PCS_SIMD
+ || pcum->pcs_variant == ARM_PCS_SVE)
{
- aarch64_layout_arg (pcum_v, arg.mode, arg.type, arg.named);
+ aarch64_layout_arg (pcum_v, arg);
gcc_assert ((pcum->aapcs_reg != NULL_RTX)
!= (pcum->aapcs_stack_words != 0));
pcum->aapcs_arg_processed = false;
pcum->aapcs_ncrn = pcum->aapcs_nextncrn;
pcum->aapcs_nvrn = pcum->aapcs_nextnvrn;
+ pcum->aapcs_nprn = pcum->aapcs_nextnprn;
pcum->aapcs_stack_size += pcum->aapcs_stack_words;
pcum->aapcs_stack_words = 0;
pcum->aapcs_reg = NULL_RTX;
@@ -5479,9 +5731,11 @@ aarch64_needs_frame_chain (void)
static void
aarch64_layout_frame (void)
{
- HOST_WIDE_INT offset = 0;
+ poly_int64 offset = 0;
int regno, last_fp_reg = INVALID_REGNUM;
- bool simd_function = (crtl->abi->id () == ARM_PCS_SIMD);
+ machine_mode vector_save_mode = aarch64_reg_save_mode (V8_REGNUM);
+ poly_int64 vector_save_size = GET_MODE_SIZE (vector_save_mode);
+ bool frame_related_fp_reg_p = false;
aarch64_frame &frame = cfun->machine->frame;
frame.emit_frame_chain = aarch64_needs_frame_chain ();
@@ -5495,12 +5749,10 @@ aarch64_layout_frame (void)
frame.wb_candidate1 = INVALID_REGNUM;
frame.wb_candidate2 = INVALID_REGNUM;
+ frame.spare_pred_reg = INVALID_REGNUM;
/* First mark all the registers that really need to be saved... */
- for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
- frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
-
- for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+ for (regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
/* ... that includes the eh data registers (if needed)... */
@@ -5523,25 +5775,83 @@ aarch64_layout_frame (void)
{
frame.reg_offset[regno] = SLOT_REQUIRED;
last_fp_reg = regno;
+ if (aarch64_emit_cfi_for_reg_p (regno))
+ frame_related_fp_reg_p = true;
}
+ /* Big-endian SVE frames need a spare predicate register in order
+ to save Z8-Z15. Decide which register they should use. Prefer
+ an unused argument register if possible, so that we don't force P4
+ to be saved unnecessarily. */
+ if (frame_related_fp_reg_p
+ && crtl->abi->id () == ARM_PCS_SVE
+ && BYTES_BIG_ENDIAN)
+ {
+ bitmap live1 = df_get_live_out (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+ bitmap live2 = df_get_live_in (EXIT_BLOCK_PTR_FOR_FN (cfun));
+ for (regno = P0_REGNUM; regno <= P7_REGNUM; regno++)
+ if (!bitmap_bit_p (live1, regno) && !bitmap_bit_p (live2, regno))
+ break;
+ gcc_assert (regno <= P7_REGNUM);
+ frame.spare_pred_reg = regno;
+ df_set_regs_ever_live (regno, true);
+ }
+
+ for (regno = P0_REGNUM; regno <= P15_REGNUM; regno++)
+ if (df_regs_ever_live_p (regno)
+ && !fixed_regs[regno]
+ && !crtl->abi->clobbers_full_reg_p (regno))
+ frame.reg_offset[regno] = SLOT_REQUIRED;
+
+ /* With stack-clash, LR must be saved in non-leaf functions. */
+ gcc_assert (crtl->is_leaf
+ || maybe_ne (frame.reg_offset[R30_REGNUM], SLOT_NOT_REQUIRED));
+
+ /* Now assign stack slots for the registers. Start with the predicate
+ registers, since predicate LDR and STR have a relatively small
+ offset range. These saves happen below the hard frame pointer. */
+ for (regno = P0_REGNUM; regno <= P15_REGNUM; regno++)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+ {
+ frame.reg_offset[regno] = offset;
+ offset += BYTES_PER_SVE_PRED;
+ }
+
+ /* We save a maximum of 8 predicate registers, and since vector
+ registers are 8 times the size of a predicate register, all the
+ saved predicates fit within a single vector. Doing this also
+ rounds the offset to a 128-bit boundary. */
+ if (maybe_ne (offset, 0))
+ {
+ gcc_assert (known_le (offset, vector_save_size));
+ offset = vector_save_size;
+ }
+
+ /* If we need to save any SVE vector registers, add them next. */
+ if (last_fp_reg != (int) INVALID_REGNUM && crtl->abi->id () == ARM_PCS_SVE)
+ for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+ {
+ frame.reg_offset[regno] = offset;
+ offset += vector_save_size;
+ }
+
+ /* OFFSET is now the offset of the hard frame pointer from the bottom
+ of the callee save area. */
+ bool saves_below_hard_fp_p = maybe_ne (offset, 0);
+ frame.below_hard_fp_saved_regs_size = offset;
if (frame.emit_frame_chain)
{
/* FP and LR are placed in the linkage record. */
- frame.reg_offset[R29_REGNUM] = 0;
+ frame.reg_offset[R29_REGNUM] = offset;
frame.wb_candidate1 = R29_REGNUM;
- frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
+ frame.reg_offset[R30_REGNUM] = offset + UNITS_PER_WORD;
frame.wb_candidate2 = R30_REGNUM;
- offset = 2 * UNITS_PER_WORD;
+ offset += 2 * UNITS_PER_WORD;
}
- /* With stack-clash, LR must be saved in non-leaf functions. */
- gcc_assert (crtl->is_leaf
- || frame.reg_offset[R30_REGNUM] != SLOT_NOT_REQUIRED);
-
- /* Now assign stack slots for them. */
for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
- if (frame.reg_offset[regno] == SLOT_REQUIRED)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
{
frame.reg_offset[regno] = offset;
if (frame.wb_candidate1 == INVALID_REGNUM)
@@ -5551,19 +5861,19 @@ aarch64_layout_frame (void)
offset += UNITS_PER_WORD;
}
- HOST_WIDE_INT max_int_offset = offset;
- offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
- bool has_align_gap = offset != max_int_offset;
+ poly_int64 max_int_offset = offset;
+ offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+ bool has_align_gap = maybe_ne (offset, max_int_offset);
for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
- if (frame.reg_offset[regno] == SLOT_REQUIRED)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
{
/* If there is an alignment gap between integer and fp callee-saves,
allocate the last fp register to it if possible. */
if (regno == last_fp_reg
&& has_align_gap
- && !simd_function
- && (offset & 8) == 0)
+ && known_eq (vector_save_size, 8)
+ && multiple_p (offset, 16))
{
frame.reg_offset[regno] = max_int_offset;
break;
@@ -5575,31 +5885,34 @@ aarch64_layout_frame (void)
else if (frame.wb_candidate2 == INVALID_REGNUM
&& frame.wb_candidate1 >= V0_REGNUM)
frame.wb_candidate2 = regno;
- offset += simd_function ? UNITS_PER_VREG : UNITS_PER_WORD;
+ offset += vector_save_size;
}
- offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+ offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
frame.saved_regs_size = offset;
- HOST_WIDE_INT varargs_and_saved_regs_size
- = offset + frame.saved_varargs_size;
+ poly_int64 varargs_and_saved_regs_size = offset + frame.saved_varargs_size;
- frame.hard_fp_offset
+ poly_int64 above_outgoing_args
= aligned_upper_bound (varargs_and_saved_regs_size
+ get_frame_size (),
STACK_BOUNDARY / BITS_PER_UNIT);
+ frame.hard_fp_offset
+ = above_outgoing_args - frame.below_hard_fp_saved_regs_size;
+
/* Both these values are already aligned. */
gcc_assert (multiple_p (crtl->outgoing_args_size,
STACK_BOUNDARY / BITS_PER_UNIT));
- frame.frame_size = frame.hard_fp_offset + crtl->outgoing_args_size;
+ frame.frame_size = above_outgoing_args + crtl->outgoing_args_size;
frame.locals_offset = frame.saved_varargs_size;
frame.initial_adjust = 0;
frame.final_adjust = 0;
frame.callee_adjust = 0;
+ frame.sve_callee_adjust = 0;
frame.callee_offset = 0;
HOST_WIDE_INT max_push_offset = 0;
@@ -5609,53 +5922,86 @@ aarch64_layout_frame (void)
max_push_offset = 256;
HOST_WIDE_INT const_size, const_outgoing_args_size, const_fp_offset;
+ HOST_WIDE_INT const_saved_regs_size;
if (frame.frame_size.is_constant (&const_size)
&& const_size < max_push_offset
- && known_eq (crtl->outgoing_args_size, 0))
+ && known_eq (frame.hard_fp_offset, const_size))
{
/* Simple, small frame with no outgoing arguments:
+
stp reg1, reg2, [sp, -frame_size]!
stp reg3, reg4, [sp, 16] */
frame.callee_adjust = const_size;
}
else if (crtl->outgoing_args_size.is_constant (&const_outgoing_args_size)
- && const_outgoing_args_size + frame.saved_regs_size < 512
+ && frame.saved_regs_size.is_constant (&const_saved_regs_size)
+ && const_outgoing_args_size + const_saved_regs_size < 512
+ /* We could handle this case even with outgoing args, provided
+ that the number of args left us with valid offsets for all
+ predicate and vector save slots. It's such a rare case that
+ it hardly seems worth the effort though. */
+ && (!saves_below_hard_fp_p || const_outgoing_args_size == 0)
&& !(cfun->calls_alloca
&& frame.hard_fp_offset.is_constant (&const_fp_offset)
&& const_fp_offset < max_push_offset))
{
/* Frame with small outgoing arguments:
+
sub sp, sp, frame_size
stp reg1, reg2, [sp, outgoing_args_size]
stp reg3, reg4, [sp, outgoing_args_size + 16] */
frame.initial_adjust = frame.frame_size;
frame.callee_offset = const_outgoing_args_size;
}
+ else if (saves_below_hard_fp_p
+ && known_eq (frame.saved_regs_size,
+ frame.below_hard_fp_saved_regs_size))
+ {
+ /* Frame in which all saves are SVE saves:
+
+ sub sp, sp, hard_fp_offset + below_hard_fp_saved_regs_size
+ save SVE registers relative to SP
+ sub sp, sp, outgoing_args_size */
+ frame.initial_adjust = (frame.hard_fp_offset
+ + frame.below_hard_fp_saved_regs_size);
+ frame.final_adjust = crtl->outgoing_args_size;
+ }
else if (frame.hard_fp_offset.is_constant (&const_fp_offset)
&& const_fp_offset < max_push_offset)
{
- /* Frame with large outgoing arguments but a small local area:
+ /* Frame with large outgoing arguments or SVE saves, but with
+ a small local area:
+
stp reg1, reg2, [sp, -hard_fp_offset]!
stp reg3, reg4, [sp, 16]
+ [sub sp, sp, below_hard_fp_saved_regs_size]
+ [save SVE registers relative to SP]
sub sp, sp, outgoing_args_size */
frame.callee_adjust = const_fp_offset;
+ frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
frame.final_adjust = crtl->outgoing_args_size;
}
else
{
- /* Frame with large local area and outgoing arguments using frame pointer:
+ /* Frame with large local area and outgoing arguments or SVE saves,
+ using frame pointer:
+
sub sp, sp, hard_fp_offset
stp x29, x30, [sp, 0]
add x29, sp, 0
stp reg3, reg4, [sp, 16]
+ [sub sp, sp, below_hard_fp_saved_regs_size]
+ [save SVE registers relative to SP]
sub sp, sp, outgoing_args_size */
frame.initial_adjust = frame.hard_fp_offset;
+ frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
frame.final_adjust = crtl->outgoing_args_size;
}
/* Make sure the individual adjustments add up to the full frame size. */
gcc_assert (known_eq (frame.initial_adjust
+ frame.callee_adjust
+ + frame.sve_callee_adjust
+ frame.final_adjust, frame.frame_size));
frame.laid_out = true;
@@ -5667,7 +6013,7 @@ aarch64_layout_frame (void)
static bool
aarch64_register_saved_on_entry (int regno)
{
- return cfun->machine->frame.reg_offset[regno] >= 0;
+ return known_ge (cfun->machine->frame.reg_offset[regno], 0);
}
/* Return the next register up from REGNO up to LIMIT for the callee
@@ -5734,7 +6080,7 @@ static void
aarch64_push_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment)
{
rtx_insn *insn;
- machine_mode mode = aarch64_reg_save_mode (cfun->decl, regno1);
+ machine_mode mode = aarch64_reg_save_mode (regno1);
if (regno2 == INVALID_REGNUM)
return aarch64_pushwb_single_reg (mode, regno1, adjustment);
@@ -5780,7 +6126,7 @@ static void
aarch64_pop_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment,
rtx *cfi_ops)
{
- machine_mode mode = aarch64_reg_save_mode (cfun->decl, regno1);
+ machine_mode mode = aarch64_reg_save_mode (regno1);
rtx reg1 = gen_rtx_REG (mode, regno1);
*cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg1, *cfi_ops);
@@ -5859,7 +6205,7 @@ aarch64_return_address_signing_enabled (void)
if its LR is pushed onto stack. */
return (aarch64_ra_sign_scope == AARCH64_FUNCTION_ALL
|| (aarch64_ra_sign_scope == AARCH64_FUNCTION_NON_LEAF
- && cfun->machine->frame.reg_offset[LR_REGNUM] >= 0));
+ && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0)));
}
/* Return TRUE if Branch Target Identification Mechanism is enabled. */
@@ -5869,17 +6215,75 @@ aarch64_bti_enabled (void)
return (aarch64_enable_bti == 1);
}
+/* The caller is going to use ST1D or LD1D to save or restore an SVE
+ register in mode MODE at BASE_RTX + OFFSET, where OFFSET is in
+ the range [1, 16] * GET_MODE_SIZE (MODE). Prepare for this by:
+
+ (1) updating BASE_RTX + OFFSET so that it is a legitimate ST1D
+ or LD1D address
+
+ (2) setting PRED to a valid predicate register for the ST1D or LD1D,
+ if the variable isn't already nonnull
+
+ (1) is needed when OFFSET is in the range [8, 16] * GET_MODE_SIZE (MODE).
+ Handle this case using a temporary base register that is suitable for
+ all offsets in that range. Use ANCHOR_REG as this base register if it
+ is nonnull, otherwise create a new register and store it in ANCHOR_REG. */
+
+static inline void
+aarch64_adjust_sve_callee_save_base (machine_mode mode, rtx &base_rtx,
+ rtx &anchor_reg, poly_int64 &offset,
+ rtx &ptrue)
+{
+ if (maybe_ge (offset, 8 * GET_MODE_SIZE (mode)))
+ {
+ /* This is the maximum valid offset of the anchor from the base.
+ Lower values would be valid too. */
+ poly_int64 anchor_offset = 16 * GET_MODE_SIZE (mode);
+ if (!anchor_reg)
+ {
+ anchor_reg = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
+ emit_insn (gen_add3_insn (anchor_reg, base_rtx,
+ gen_int_mode (anchor_offset, Pmode)));
+ }
+ base_rtx = anchor_reg;
+ offset -= anchor_offset;
+ }
+ if (!ptrue)
+ {
+ int pred_reg = cfun->machine->frame.spare_pred_reg;
+ emit_move_insn (gen_rtx_REG (VNx16BImode, pred_reg),
+ CONSTM1_RTX (VNx16BImode));
+ ptrue = gen_rtx_REG (VNx2BImode, pred_reg);
+ }
+}
+
+/* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
+ is saved at BASE + OFFSET. */
+
+static void
+aarch64_add_cfa_expression (rtx_insn *insn, rtx reg,
+ rtx base, poly_int64 offset)
+{
+ rtx mem = gen_frame_mem (GET_MODE (reg),
+ plus_constant (Pmode, base, offset));
+ add_reg_note (insn, REG_CFA_EXPRESSION, gen_rtx_SET (mem, reg));
+}
+
/* Emit code to save the callee-saved registers from register number START
to LIMIT to the stack at the location starting at offset START_OFFSET,
- skipping any write-back candidates if SKIP_WB is true. */
+ skipping any write-back candidates if SKIP_WB is true. HARD_FP_VALID_P
+ is true if the hard frame pointer has been set up. */
static void
-aarch64_save_callee_saves (machine_mode mode, poly_int64 start_offset,
- unsigned start, unsigned limit, bool skip_wb)
+aarch64_save_callee_saves (poly_int64 start_offset,
+ unsigned start, unsigned limit, bool skip_wb,
+ bool hard_fp_valid_p)
{
rtx_insn *insn;
unsigned regno;
unsigned regno2;
+ rtx anchor_reg = NULL_RTX, ptrue = NULL_RTX;
for (regno = aarch64_next_callee_save (start, limit);
regno <= limit;
@@ -5887,7 +6291,7 @@ aarch64_save_callee_saves (machine_mode mode, poly_int64 start_offset,
{
rtx reg, mem;
poly_int64 offset;
- int offset_diff;
+ bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
if (skip_wb
&& (regno == cfun->machine->frame.wb_candidate1
@@ -5895,27 +6299,53 @@ aarch64_save_callee_saves (machine_mode mode, poly_int64 start_offset,
continue;
if (cfun->machine->reg_is_wrapped_separately[regno])
- continue;
+ continue;
+ machine_mode mode = aarch64_reg_save_mode (regno);
reg = gen_rtx_REG (mode, regno);
offset = start_offset + cfun->machine->frame.reg_offset[regno];
- mem = gen_frame_mem (mode, plus_constant (Pmode, stack_pointer_rtx,
- offset));
+ rtx base_rtx = stack_pointer_rtx;
+ poly_int64 sp_offset = offset;
- regno2 = aarch64_next_callee_save (regno + 1, limit);
- offset_diff = cfun->machine->frame.reg_offset[regno2]
- - cfun->machine->frame.reg_offset[regno];
+ HOST_WIDE_INT const_offset;
+ if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg,
+ offset, ptrue);
+ else if (GP_REGNUM_P (regno)
+ && (!offset.is_constant (&const_offset) || const_offset >= 512))
+ {
+ gcc_assert (known_eq (start_offset, 0));
+ poly_int64 fp_offset
+ = cfun->machine->frame.below_hard_fp_saved_regs_size;
+ if (hard_fp_valid_p)
+ base_rtx = hard_frame_pointer_rtx;
+ else
+ {
+ if (!anchor_reg)
+ {
+ anchor_reg = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
+ emit_insn (gen_add3_insn (anchor_reg, base_rtx,
+ gen_int_mode (fp_offset, Pmode)));
+ }
+ base_rtx = anchor_reg;
+ }
+ offset -= fp_offset;
+ }
+ mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset));
+ bool need_cfa_note_p = (base_rtx != stack_pointer_rtx);
- if (regno2 <= limit
+ if (!aarch64_sve_mode_p (mode)
+ && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
&& !cfun->machine->reg_is_wrapped_separately[regno2]
- && known_eq (GET_MODE_SIZE (mode), offset_diff))
+ && known_eq (GET_MODE_SIZE (mode),
+ cfun->machine->frame.reg_offset[regno2]
+ - cfun->machine->frame.reg_offset[regno]))
{
rtx reg2 = gen_rtx_REG (mode, regno2);
rtx mem2;
- offset = start_offset + cfun->machine->frame.reg_offset[regno2];
- mem2 = gen_frame_mem (mode, plus_constant (Pmode, stack_pointer_rtx,
- offset));
+ offset += GET_MODE_SIZE (mode);
+ mem2 = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset));
insn = emit_insn (aarch64_gen_store_pair (mode, mem, reg, mem2,
reg2));
@@ -5923,71 +6353,96 @@ aarch64_save_callee_saves (machine_mode mode, poly_int64 start_offset,
always assumed to be relevant to the frame
calculations; subsequent parts, are only
frame-related if explicitly marked. */
- RTX_FRAME_RELATED_P (XVECEXP (PATTERN (insn), 0, 1)) = 1;
+ if (aarch64_emit_cfi_for_reg_p (regno2))
+ {
+ if (need_cfa_note_p)
+ aarch64_add_cfa_expression (insn, reg2, stack_pointer_rtx,
+ sp_offset + GET_MODE_SIZE (mode));
+ else
+ RTX_FRAME_RELATED_P (XVECEXP (PATTERN (insn), 0, 1)) = 1;
+ }
+
regno = regno2;
}
+ else if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ {
+ insn = emit_insn (gen_aarch64_pred_mov (mode, mem, ptrue, reg));
+ need_cfa_note_p = true;
+ }
+ else if (aarch64_sve_mode_p (mode))
+ insn = emit_insn (gen_rtx_SET (mem, reg));
else
insn = emit_move_insn (mem, reg);
- RTX_FRAME_RELATED_P (insn) = 1;
+ RTX_FRAME_RELATED_P (insn) = frame_related_p;
+ if (frame_related_p && need_cfa_note_p)
+ aarch64_add_cfa_expression (insn, reg, stack_pointer_rtx, sp_offset);
}
}
-/* Emit code to restore the callee registers of mode MODE from register
- number START up to and including LIMIT. Restore from the stack offset
- START_OFFSET, skipping any write-back candidates if SKIP_WB is true.
- Write the appropriate REG_CFA_RESTORE notes into CFI_OPS. */
+/* Emit code to restore the callee registers from register number START
+ up to and including LIMIT. Restore from the stack offset START_OFFSET,
+ skipping any write-back candidates if SKIP_WB is true. Write the
+ appropriate REG_CFA_RESTORE notes into CFI_OPS. */
static void
-aarch64_restore_callee_saves (machine_mode mode,
- poly_int64 start_offset, unsigned start,
+aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start,
unsigned limit, bool skip_wb, rtx *cfi_ops)
{
- rtx base_rtx = stack_pointer_rtx;
unsigned regno;
unsigned regno2;
poly_int64 offset;
+ rtx anchor_reg = NULL_RTX, ptrue = NULL_RTX;
for (regno = aarch64_next_callee_save (start, limit);
regno <= limit;
regno = aarch64_next_callee_save (regno + 1, limit))
{
+ bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
if (cfun->machine->reg_is_wrapped_separately[regno])
- continue;
+ continue;
rtx reg, mem;
- int offset_diff;
if (skip_wb
&& (regno == cfun->machine->frame.wb_candidate1
|| regno == cfun->machine->frame.wb_candidate2))
continue;
+ machine_mode mode = aarch64_reg_save_mode (regno);
reg = gen_rtx_REG (mode, regno);
offset = start_offset + cfun->machine->frame.reg_offset[regno];
+ rtx base_rtx = stack_pointer_rtx;
+ if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg,
+ offset, ptrue);
mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset));
- regno2 = aarch64_next_callee_save (regno + 1, limit);
- offset_diff = cfun->machine->frame.reg_offset[regno2]
- - cfun->machine->frame.reg_offset[regno];
-
- if (regno2 <= limit
+ if (!aarch64_sve_mode_p (mode)
+ && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
&& !cfun->machine->reg_is_wrapped_separately[regno2]
- && known_eq (GET_MODE_SIZE (mode), offset_diff))
+ && known_eq (GET_MODE_SIZE (mode),
+ cfun->machine->frame.reg_offset[regno2]
+ - cfun->machine->frame.reg_offset[regno]))
{
rtx reg2 = gen_rtx_REG (mode, regno2);
rtx mem2;
- offset = start_offset + cfun->machine->frame.reg_offset[regno2];
+ offset += GET_MODE_SIZE (mode);
mem2 = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset));
emit_insn (aarch64_gen_load_pair (mode, reg, mem, reg2, mem2));
*cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg2, *cfi_ops);
regno = regno2;
}
+ else if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ emit_insn (gen_aarch64_pred_mov (mode, reg, ptrue, mem));
+ else if (aarch64_sve_mode_p (mode))
+ emit_insn (gen_rtx_SET (reg, mem));
else
emit_move_insn (reg, mem);
- *cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg, *cfi_ops);
+ if (frame_related_p)
+ *cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg, *cfi_ops);
}
}
@@ -6069,13 +6524,35 @@ aarch64_get_separate_components (void)
for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
if (aarch64_register_saved_on_entry (regno))
{
+ /* Punt on saves and restores that use ST1D and LD1D. We could
+ try to be smarter, but it would involve making sure that the
+ spare predicate register itself is safe to use at the save
+ and restore points. Also, when a frame pointer is being used,
+ the slots are often out of reach of ST1D and LD1D anyway. */
+ machine_mode mode = aarch64_reg_save_mode (regno);
+ if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ continue;
+
poly_int64 offset = cfun->machine->frame.reg_offset[regno];
- if (!frame_pointer_needed)
- offset += cfun->machine->frame.frame_size
- - cfun->machine->frame.hard_fp_offset;
+
+ /* If the register is saved in the first SVE save slot, we use
+ it as a stack probe for -fstack-clash-protection. */
+ if (flag_stack_clash_protection
+ && maybe_ne (cfun->machine->frame.below_hard_fp_saved_regs_size, 0)
+ && known_eq (offset, 0))
+ continue;
+
+ /* Get the offset relative to the register we'll use. */
+ if (frame_pointer_needed)
+ offset -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+ else
+ offset += crtl->outgoing_args_size;
+
/* Check that we can access the stack slot of the register with one
direct load with no adjustments needed. */
- if (offset_12bit_unsigned_scaled_p (DImode, offset))
+ if (aarch64_sve_mode_p (mode)
+ ? offset_9bit_signed_scaled_p (mode, offset)
+ : offset_12bit_unsigned_scaled_p (mode, offset))
bitmap_set_bit (components, regno);
}
@@ -6083,6 +6560,12 @@ aarch64_get_separate_components (void)
if (frame_pointer_needed)
bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM);
+ /* If the spare predicate register used by big-endian SVE code
+ is call-preserved, it must be saved in the main prologue
+ before any saves that use it. */
+ if (cfun->machine->frame.spare_pred_reg != INVALID_REGNUM)
+ bitmap_clear_bit (components, cfun->machine->frame.spare_pred_reg);
+
unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
/* If registers have been chosen to be stored/restored with
@@ -6136,18 +6619,19 @@ aarch64_components_for_bb (basic_block bb)
|| bitmap_bit_p (gen, regno)
|| bitmap_bit_p (kill, regno)))
{
- unsigned regno2, offset, offset2;
bitmap_set_bit (components, regno);
/* If there is a callee-save at an adjacent offset, add it too
to increase the use of LDP/STP. */
- offset = cfun->machine->frame.reg_offset[regno];
- regno2 = ((offset & 8) == 0) ? regno + 1 : regno - 1;
+ poly_int64 offset = cfun->machine->frame.reg_offset[regno];
+ unsigned regno2 = multiple_p (offset, 16) ? regno + 1 : regno - 1;
if (regno2 <= LAST_SAVED_REGNUM)
{
- offset2 = cfun->machine->frame.reg_offset[regno2];
- if ((offset & ~8) == (offset2 & ~8))
+ poly_int64 offset2 = cfun->machine->frame.reg_offset[regno2];
+ if (regno < regno2
+ ? known_eq (offset + 8, offset2)
+ : multiple_p (offset2, 16) && known_eq (offset2 + 8, offset))
bitmap_set_bit (components, regno2);
}
}
@@ -6202,16 +6686,16 @@ aarch64_process_components (sbitmap components, bool prologue_p)
while (regno != last_regno)
{
- /* AAPCS64 section 5.1.2 requires only the low 64 bits to be saved
- so DFmode for the vector registers is enough. For simd functions
- we want to save the low 128 bits. */
- machine_mode mode = aarch64_reg_save_mode (cfun->decl, regno);
+ bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
+ machine_mode mode = aarch64_reg_save_mode (regno);
rtx reg = gen_rtx_REG (mode, regno);
poly_int64 offset = cfun->machine->frame.reg_offset[regno];
- if (!frame_pointer_needed)
- offset += cfun->machine->frame.frame_size
- - cfun->machine->frame.hard_fp_offset;
+ if (frame_pointer_needed)
+ offset -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+ else
+ offset += crtl->outgoing_args_size;
+
rtx addr = plus_constant (Pmode, ptr_reg, offset);
rtx mem = gen_frame_mem (mode, addr);
@@ -6222,39 +6706,49 @@ aarch64_process_components (sbitmap components, bool prologue_p)
if (regno2 == last_regno)
{
insn = emit_insn (set);
- RTX_FRAME_RELATED_P (insn) = 1;
- if (prologue_p)
- add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set));
- else
- add_reg_note (insn, REG_CFA_RESTORE, reg);
+ if (frame_related_p)
+ {
+ RTX_FRAME_RELATED_P (insn) = 1;
+ if (prologue_p)
+ add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set));
+ else
+ add_reg_note (insn, REG_CFA_RESTORE, reg);
+ }
break;
}
poly_int64 offset2 = cfun->machine->frame.reg_offset[regno2];
/* The next register is not of the same class or its offset is not
mergeable with the current one into a pair. */
- if (!satisfies_constraint_Ump (mem)
+ if (aarch64_sve_mode_p (mode)
+ || !satisfies_constraint_Ump (mem)
|| GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
|| (crtl->abi->id () == ARM_PCS_SIMD && FP_REGNUM_P (regno))
|| maybe_ne ((offset2 - cfun->machine->frame.reg_offset[regno]),
GET_MODE_SIZE (mode)))
{
insn = emit_insn (set);
- RTX_FRAME_RELATED_P (insn) = 1;
- if (prologue_p)
- add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set));
- else
- add_reg_note (insn, REG_CFA_RESTORE, reg);
+ if (frame_related_p)
+ {
+ RTX_FRAME_RELATED_P (insn) = 1;
+ if (prologue_p)
+ add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set));
+ else
+ add_reg_note (insn, REG_CFA_RESTORE, reg);
+ }
regno = regno2;
continue;
}
+ bool frame_related2_p = aarch64_emit_cfi_for_reg_p (regno2);
+
/* REGNO2 can be saved/restored in a pair with REGNO. */
rtx reg2 = gen_rtx_REG (mode, regno2);
- if (!frame_pointer_needed)
- offset2 += cfun->machine->frame.frame_size
- - cfun->machine->frame.hard_fp_offset;
+ if (frame_pointer_needed)
+ offset2 -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+ else
+ offset2 += crtl->outgoing_args_size;
rtx addr2 = plus_constant (Pmode, ptr_reg, offset2);
rtx mem2 = gen_frame_mem (mode, addr2);
rtx set2 = prologue_p ? gen_rtx_SET (mem2, reg2)
@@ -6265,16 +6759,23 @@ aarch64_process_components (sbitmap components, bool prologue_p)
else
insn = emit_insn (aarch64_gen_load_pair (mode, reg, mem, reg2, mem2));
- RTX_FRAME_RELATED_P (insn) = 1;
- if (prologue_p)
+ if (frame_related_p || frame_related2_p)
{
- add_reg_note (insn, REG_CFA_OFFSET, set);
- add_reg_note (insn, REG_CFA_OFFSET, set2);
- }
- else
- {
- add_reg_note (insn, REG_CFA_RESTORE, reg);
- add_reg_note (insn, REG_CFA_RESTORE, reg2);
+ RTX_FRAME_RELATED_P (insn) = 1;
+ if (prologue_p)
+ {
+ if (frame_related_p)
+ add_reg_note (insn, REG_CFA_OFFSET, set);
+ if (frame_related2_p)
+ add_reg_note (insn, REG_CFA_OFFSET, set2);
+ }
+ else
+ {
+ if (frame_related_p)
+ add_reg_note (insn, REG_CFA_RESTORE, reg);
+ if (frame_related2_p)
+ add_reg_note (insn, REG_CFA_RESTORE, reg2);
+ }
}
regno = aarch64_get_next_set_bit (components, regno2 + 1);
@@ -6343,15 +6844,31 @@ aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2,
HOST_WIDE_INT guard_size
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
- /* When doing the final adjustment for the outgoing argument size we can't
- assume that LR was saved at position 0. So subtract it's offset from the
- ABI safe buffer so that we don't accidentally allow an adjustment that
- would result in an allocation larger than the ABI buffer without
- probing. */
HOST_WIDE_INT min_probe_threshold
- = final_adjustment_p
- ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
- : guard_size - guard_used_by_caller;
+ = (final_adjustment_p
+ ? guard_used_by_caller
+ : guard_size - guard_used_by_caller);
+ /* When doing the final adjustment for the outgoing arguments, take into
+ account any unprobed space there is above the current SP. There are
+ two cases:
+
+ - When saving SVE registers below the hard frame pointer, we force
+ the lowest save to take place in the prologue before doing the final
+ adjustment (i.e. we don't allow the save to be shrink-wrapped).
+ This acts as a probe at SP, so there is no unprobed space.
+
+ - When there are no SVE register saves, we use the store of the link
+ register as a probe. We can't assume that LR was saved at position 0
+ though, so treat any space below it as unprobed. */
+ if (final_adjustment_p
+ && known_eq (cfun->machine->frame.below_hard_fp_saved_regs_size, 0))
+ {
+ poly_int64 lr_offset = cfun->machine->frame.reg_offset[LR_REGNUM];
+ if (known_ge (lr_offset, 0))
+ min_probe_threshold -= lr_offset.to_constant ();
+ else
+ gcc_assert (!flag_stack_clash_protection || known_eq (poly_size, 0));
+ }
poly_int64 frame_size = cfun->machine->frame.frame_size;
@@ -6361,13 +6878,15 @@ aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2,
if (flag_stack_clash_protection && !final_adjustment_p)
{
poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+ poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
poly_int64 final_adjust = cfun->machine->frame.final_adjust;
if (known_eq (frame_size, 0))
{
dump_stack_clash_frame_info (NO_PROBE_NO_FRAME, false);
}
- else if (known_lt (initial_adjust, guard_size - guard_used_by_caller)
+ else if (known_lt (initial_adjust + sve_callee_adjust,
+ guard_size - guard_used_by_caller)
&& known_lt (final_adjust, guard_used_by_caller))
{
dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true);
@@ -6571,18 +7090,6 @@ aarch64_epilogue_uses (int regno)
return 0;
}
-/* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
- is saved at BASE + OFFSET. */
-
-static void
-aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
- rtx base, poly_int64 offset)
-{
- rtx mem = gen_frame_mem (DImode, plus_constant (Pmode, base, offset));
- add_reg_note (insn, REG_CFA_EXPRESSION,
- gen_rtx_SET (mem, regno_reg_rtx[reg]));
-}
-
/* AArch64 stack frames generated by this compiler look like:
+-------------------------------+
@@ -6604,8 +7111,12 @@ aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
+-------------------------------+ |
| LR' | |
+-------------------------------+ |
- | FP' | / <- hard_frame_pointer_rtx (aligned)
- +-------------------------------+
+ | FP' | |
+ +-------------------------------+ |<- hard_frame_pointer_rtx (aligned)
+ | SVE vector registers | | \
+ +-------------------------------+ | | below_hard_fp_saved_regs_size
+ | SVE predicate registers | / /
+ +-------------------------------+
| dynamic allocation |
+-------------------------------+
| padding |
@@ -6638,7 +7149,8 @@ aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
The following registers are reserved during frame layout and should not be
used for any other purpose:
- - r11: Used by stack clash protection when SVE is enabled.
+ - r11: Used by stack clash protection when SVE is enabled, and also
+ as an anchor register when saving and restoring registers
- r12(EP0) and r13(EP1): Used as temporaries for stack adjustment.
- r14 and r15: Used for speculation tracking.
- r16(IP0), r17(IP1): Used by indirect tailcalls.
@@ -6661,11 +7173,23 @@ aarch64_expand_prologue (void)
HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
poly_int64 final_adjust = cfun->machine->frame.final_adjust;
poly_int64 callee_offset = cfun->machine->frame.callee_offset;
+ poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+ poly_int64 below_hard_fp_saved_regs_size
+ = cfun->machine->frame.below_hard_fp_saved_regs_size;
unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
bool emit_frame_chain = cfun->machine->frame.emit_frame_chain;
rtx_insn *insn;
+ if (flag_stack_clash_protection && known_eq (callee_adjust, 0))
+ {
+ /* Fold the SVE allocation into the initial allocation.
+ We don't do this in aarch64_layout_arg to avoid pessimizing
+ the epilogue code. */
+ initial_adjust += sve_callee_adjust;
+ sve_callee_adjust = 0;
+ }
+
/* Sign return address for functions. */
if (aarch64_return_address_signing_enabled ())
{
@@ -6718,18 +7242,27 @@ aarch64_expand_prologue (void)
if (callee_adjust != 0)
aarch64_push_regs (reg1, reg2, callee_adjust);
+ /* The offset of the frame chain record (if any) from the current SP. */
+ poly_int64 chain_offset = (initial_adjust + callee_adjust
+ - cfun->machine->frame.hard_fp_offset);
+ gcc_assert (known_ge (chain_offset, 0));
+
+ /* The offset of the bottom of the save area from the current SP. */
+ poly_int64 saved_regs_offset = chain_offset - below_hard_fp_saved_regs_size;
+
if (emit_frame_chain)
{
- poly_int64 reg_offset = callee_adjust;
if (callee_adjust == 0)
{
reg1 = R29_REGNUM;
reg2 = R30_REGNUM;
- reg_offset = callee_offset;
- aarch64_save_callee_saves (DImode, reg_offset, reg1, reg2, false);
+ aarch64_save_callee_saves (saved_regs_offset, reg1, reg2,
+ false, false);
}
+ else
+ gcc_assert (known_eq (chain_offset, 0));
aarch64_add_offset (Pmode, hard_frame_pointer_rtx,
- stack_pointer_rtx, callee_offset,
+ stack_pointer_rtx, chain_offset,
tmp1_rtx, tmp0_rtx, frame_pointer_needed);
if (frame_pointer_needed && !frame_size.is_constant ())
{
@@ -6756,23 +7289,31 @@ aarch64_expand_prologue (void)
/* Change the save slot expressions for the registers that
we've already saved. */
- reg_offset -= callee_offset;
- aarch64_add_cfa_expression (insn, reg2, hard_frame_pointer_rtx,
- reg_offset + UNITS_PER_WORD);
- aarch64_add_cfa_expression (insn, reg1, hard_frame_pointer_rtx,
- reg_offset);
+ aarch64_add_cfa_expression (insn, regno_reg_rtx[reg2],
+ hard_frame_pointer_rtx, UNITS_PER_WORD);
+ aarch64_add_cfa_expression (insn, regno_reg_rtx[reg1],
+ hard_frame_pointer_rtx, 0);
}
emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
}
- aarch64_save_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
- callee_adjust != 0 || emit_frame_chain);
- if (crtl->abi->id () == ARM_PCS_SIMD)
- aarch64_save_callee_saves (TFmode, callee_offset, V0_REGNUM, V31_REGNUM,
- callee_adjust != 0 || emit_frame_chain);
- else
- aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
- callee_adjust != 0 || emit_frame_chain);
+ aarch64_save_callee_saves (saved_regs_offset, R0_REGNUM, R30_REGNUM,
+ callee_adjust != 0 || emit_frame_chain,
+ emit_frame_chain);
+ if (maybe_ne (sve_callee_adjust, 0))
+ {
+ gcc_assert (!flag_stack_clash_protection
+ || known_eq (initial_adjust, 0));
+ aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx,
+ sve_callee_adjust,
+ !frame_pointer_needed, false);
+ saved_regs_offset += sve_callee_adjust;
+ }
+ aarch64_save_callee_saves (saved_regs_offset, P0_REGNUM, P15_REGNUM,
+ false, emit_frame_chain);
+ aarch64_save_callee_saves (saved_regs_offset, V0_REGNUM, V31_REGNUM,
+ callee_adjust != 0 || emit_frame_chain,
+ emit_frame_chain);
/* We may need to probe the final adjustment if it is larger than the guard
that is assumed by the called. */
@@ -6810,6 +7351,9 @@ aarch64_expand_epilogue (bool for_sibcall)
HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
poly_int64 final_adjust = cfun->machine->frame.final_adjust;
poly_int64 callee_offset = cfun->machine->frame.callee_offset;
+ poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+ poly_int64 below_hard_fp_saved_regs_size
+ = cfun->machine->frame.below_hard_fp_saved_regs_size;
unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
rtx cfi_ops = NULL;
@@ -6823,15 +7367,23 @@ aarch64_expand_epilogue (bool for_sibcall)
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
- /* We can re-use the registers when the allocation amount is smaller than
- guard_size - guard_used_by_caller because we won't be doing any probes
- then. In such situations the register should remain live with the correct
+ /* We can re-use the registers when:
+
+ (a) the deallocation amount is the same as the corresponding
+ allocation amount (which is false if we combine the initial
+ and SVE callee save allocations in the prologue); and
+
+ (b) the allocation amount doesn't need a probe (which is false
+ if the amount is guard_size - guard_used_by_caller or greater).
+
+ In such situations the register should remain live with the correct
value. */
bool can_inherit_p = (initial_adjust.is_constant ()
- && final_adjust.is_constant ())
+ && final_adjust.is_constant ()
&& (!flag_stack_clash_protection
- || known_lt (initial_adjust,
- guard_size - guard_used_by_caller));
+ || (known_lt (initial_adjust,
+ guard_size - guard_used_by_caller)
+ && known_eq (sve_callee_adjust, 0))));
/* We need to add memory barrier to prevent read from deallocated stack. */
bool need_barrier_p
@@ -6856,7 +7408,8 @@ aarch64_expand_epilogue (bool for_sibcall)
/* If writeback is used when restoring callee-saves, the CFA
is restored on the instruction doing the writeback. */
aarch64_add_offset (Pmode, stack_pointer_rtx,
- hard_frame_pointer_rtx, -callee_offset,
+ hard_frame_pointer_rtx,
+ -callee_offset - below_hard_fp_saved_regs_size,
tmp1_rtx, tmp0_rtx, callee_adjust == 0);
else
/* The case where we need to re-use the register here is very rare, so
@@ -6864,14 +7417,17 @@ aarch64_expand_epilogue (bool for_sibcall)
immediate doesn't fit. */
aarch64_add_sp (tmp1_rtx, tmp0_rtx, final_adjust, true);
- aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
+ /* Restore the vector registers before the predicate registers,
+ so that we can use P4 as a temporary for big-endian SVE frames. */
+ aarch64_restore_callee_saves (callee_offset, V0_REGNUM, V31_REGNUM,
+ callee_adjust != 0, &cfi_ops);
+ aarch64_restore_callee_saves (callee_offset, P0_REGNUM, P15_REGNUM,
+ false, &cfi_ops);
+ if (maybe_ne (sve_callee_adjust, 0))
+ aarch64_add_sp (NULL_RTX, NULL_RTX, sve_callee_adjust, true);
+ aarch64_restore_callee_saves (callee_offset - sve_callee_adjust,
+ R0_REGNUM, R30_REGNUM,
callee_adjust != 0, &cfi_ops);
- if (crtl->abi->id () == ARM_PCS_SIMD)
- aarch64_restore_callee_saves (TFmode, callee_offset, V0_REGNUM, V31_REGNUM,
- callee_adjust != 0, &cfi_ops);
- else
- aarch64_restore_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
- callee_adjust != 0, &cfi_ops);
if (need_barrier_p)
emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
@@ -9397,13 +9953,14 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x,
secondary_reload_info *sri)
{
/* Use aarch64_sve_reload_be for SVE reloads that cannot be handled
- directly by the *aarch64_sve_mov<mode>_be move pattern. See the
+ directly by the *aarch64_sve_mov<mode>_[lb]e move patterns. See the
comment at the head of aarch64-sve.md for more details about the
big-endian handling. */
if (BYTES_BIG_ENDIAN
&& reg_class_subset_p (rclass, FP_REGS)
&& !((REG_P (x) && HARD_REGISTER_P (x))
|| aarch64_simd_valid_immediate (x, NULL))
+ && mode != VNx16QImode
&& aarch64_sve_data_mode_p (mode))
{
sri->icode = CODE_FOR_aarch64_sve_reload_be;
@@ -14983,6 +15540,10 @@ aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep)
machine_mode mode;
HOST_WIDE_INT size;
+ /* SVE types (and types containing SVE types) must be handled
+ before calling this function. */
+ gcc_assert (!aarch64_sve::builtin_type_p (type));
+
switch (TREE_CODE (type))
{
case REAL_TYPE:
@@ -15154,6 +15715,9 @@ aarch64_short_vector_p (const_tree type,
{
poly_int64 size = -1;
+ if (type && aarch64_sve::builtin_type_p (type))
+ return false;
+
if (type && TREE_CODE (type) == VECTOR_TYPE)
size = int_size_in_bytes (type);
else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
@@ -15214,11 +15778,14 @@ aarch64_vfp_is_call_or_return_candidate (machine_mode mode,
int *count,
bool *is_ha)
{
+ if (is_ha != NULL) *is_ha = false;
+
+ if (type && aarch64_sve::builtin_type_p (type))
+ return false;
+
machine_mode new_mode = VOIDmode;
bool composite_p = aarch64_composite_type_p (type, mode);
- if (is_ha != NULL) *is_ha = false;
-
if ((!composite_p && GET_MODE_CLASS (mode) == MODE_FLOAT)
|| aarch64_short_vector_p (type, mode))
{
@@ -17121,11 +17688,15 @@ aarch64_asm_preferred_eh_data_format (int code ATTRIBUTE_UNUSED, int global)
static void
aarch64_asm_output_variant_pcs (FILE *stream, const tree decl, const char* name)
{
- if (aarch64_simd_decl_p (decl))
+ if (TREE_CODE (decl) == FUNCTION_DECL)
{
- fprintf (stream, "\t.variant_pcs\t");
- assemble_name (stream, name);
- fprintf (stream, "\n");
+ arm_pcs pcs = (arm_pcs) fndecl_abi (decl).id ();
+ if (pcs == ARM_PCS_SIMD || pcs == ARM_PCS_SVE)
+ {
+ fprintf (stream, "\t.variant_pcs\t");
+ assemble_name (stream, name);
+ fprintf (stream, "\n");
+ }
}
}
@@ -21373,6 +21944,9 @@ aarch64_libgcc_floating_mode_supported_p
#undef TARGET_ASM_POST_CFI_STARTPROC
#define TARGET_ASM_POST_CFI_STARTPROC aarch64_post_cfi_startproc
+#undef TARGET_STRICT_ARGUMENT_NAMING
+#define TARGET_STRICT_ARGUMENT_NAMING hook_bool_CUMULATIVE_ARGS_true
+
struct gcc_target targetm = TARGET_INITIALIZER;
#include "gt-aarch64.h"
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 070fdc9..425a363 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -479,9 +479,10 @@ extern unsigned aarch64_architecture_version;
#define ARG_POINTER_REGNUM AP_REGNUM
#define FIRST_PSEUDO_REGISTER (FFRT_REGNUM + 1)
-/* The number of (integer) argument register available. */
+/* The number of argument registers available for each class. */
#define NUM_ARG_REGS 8
#define NUM_FP_ARG_REGS 8
+#define NUM_PR_ARG_REGS 4
/* A Homogeneous Floating-Point or Short-Vector Aggregate may have at most
four members. */
@@ -725,7 +726,7 @@ extern enum aarch64_processor aarch64_tune;
#ifdef HAVE_POLY_INT_H
struct GTY (()) aarch64_frame
{
- HOST_WIDE_INT reg_offset[FIRST_PSEUDO_REGISTER];
+ poly_int64 reg_offset[LAST_SAVED_REGNUM + 1];
/* The number of extra stack bytes taken up by register varargs.
This area is allocated by the callee at the very top of the
@@ -733,9 +734,12 @@ struct GTY (()) aarch64_frame
STACK_BOUNDARY. */
HOST_WIDE_INT saved_varargs_size;
- /* The size of the saved callee-save int/FP registers. */
+ /* The size of the callee-save registers with a slot in REG_OFFSET. */
+ poly_int64 saved_regs_size;
- HOST_WIDE_INT saved_regs_size;
+ /* The size of the callee-save registers with a slot in REG_OFFSET that
+ are saved below the hard frame pointer. */
+ poly_int64 below_hard_fp_saved_regs_size;
/* Offset from the base of the frame (incomming SP) to the
top of the locals area. This value is always a multiple of
@@ -763,6 +767,10 @@ struct GTY (()) aarch64_frame
It may be non-zero if no push is used (ie. callee_adjust == 0). */
poly_int64 callee_offset;
+ /* The size of the stack adjustment before saving or after restoring
+ SVE registers. */
+ poly_int64 sve_callee_adjust;
+
/* The size of the stack adjustment after saving callee-saves. */
poly_int64 final_adjust;
@@ -772,6 +780,11 @@ struct GTY (()) aarch64_frame
unsigned wb_candidate1;
unsigned wb_candidate2;
+ /* Big-endian SVE frames need a spare predicate register in order
+ to save vector registers in the correct layout for unwinding.
+ This is the register they should use. */
+ unsigned spare_pred_reg;
+
bool laid_out;
};
@@ -800,6 +813,8 @@ enum arm_pcs
{
ARM_PCS_AAPCS64, /* Base standard AAPCS for 64 bit. */
ARM_PCS_SIMD, /* For aarch64_vector_pcs functions. */
+ ARM_PCS_SVE, /* For functions that pass or return
+ values in SVE registers. */
ARM_PCS_TLSDESC, /* For targets of tlsdesc calls. */
ARM_PCS_UNKNOWN
};
@@ -827,6 +842,8 @@ typedef struct
int aapcs_nextncrn; /* Next next core register number. */
int aapcs_nvrn; /* Next Vector register number. */
int aapcs_nextnvrn; /* Next Next Vector register number. */
+ int aapcs_nprn; /* Next Predicate register number. */
+ int aapcs_nextnprn; /* Next Next Predicate register number. */
rtx aapcs_reg; /* Register assigned to this argument. This
is NULL_RTX if this parameter goes on
the stack. */
@@ -837,6 +854,8 @@ typedef struct
aapcs_reg == NULL_RTX. */
int aapcs_stack_size; /* The total size (in words, per 8 byte) of the
stack arg area so far. */
+ bool silent_p; /* True if we should act silently, rather than
+ raise an error for invalid calls. */
} CUMULATIVE_ARGS;
#endif
@@ -1144,7 +1163,8 @@ extern poly_uint16 aarch64_sve_vg;
#define BITS_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 64))
#define BYTES_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 8))
-/* The number of bytes in an SVE predicate. */
+/* The number of bits and bytes in an SVE predicate. */
+#define BITS_PER_SVE_PRED BYTES_PER_SVE_VECTOR
#define BYTES_PER_SVE_PRED aarch64_sve_vg
/* The SVE mode for a vector of bytes. */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d1fe173..f19e227 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -85,7 +85,6 @@
(V29_REGNUM 61)
(V30_REGNUM 62)
(V31_REGNUM 63)
- (LAST_SAVED_REGNUM 63)
(SFP_REGNUM 64)
(AP_REGNUM 65)
(CC_REGNUM 66)
@@ -107,6 +106,7 @@
(P13_REGNUM 81)
(P14_REGNUM 82)
(P15_REGNUM 83)
+ (LAST_SAVED_REGNUM 83)
(FFR_REGNUM 84)
;; "FFR token": a fake register used for representing the scheduling
;; restrictions on FFR-related operations.