aboutsummaryrefslogtreecommitdiff
path: root/gcc/config/aarch64
AgeCommit message (Collapse)AuthorFilesLines
8 daysaarch64: Remove old aarch64_expand_sve_vec_cmp_float codeRichard Sandiford3-29/+11
While looking at PR118956, I noticed that we had some dead code left over after the removal of the vcond patterns. The can_invert_p path is no longer used. gcc/ * config/aarch64/aarch64-protos.h (aarch64_expand_sve_vec_cmp_float): Remove can_invert_p argument and change return type to void. * config/aarch64/aarch64.cc (aarch64_expand_sve_vec_cmp_float): Likewise. * config/aarch64/aarch64-sve.md (vec_cmp<mode><vpred>): Update call accordingly.
10 daysaarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.hSoumya AR1-1/+1
generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses generic_prefetch_tune in generic_armv8_a_tunings. This patch updates the pointer to generic_armv8_a_prefetch_tune. This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch struct pointer.
2025-02-11[PR target/115478] Accept ADD, IOR or XOR when combining objects with no ↵Jeff Law2-23/+31
bits in common So the change to prefer ADD over IOR for combining two objects with no bits in common is (IMHO) generally good. It has some minor fallout. In particular the aarch64 port (and I suspect others) have patterns that recognize IOR, but not PLUS or XOR for these cases and thus tests which expected to optimize with IOR are no longer optimizing. Roger suggested using a code iterator for this purpose. Richard S. suggested a new match operator to cover those cases. I really like the match operator idea, but as Richard S. notes in the PR it would require either not validating the "no bits in common", which dramatically reduces the utility IMHO or we'd need some work to allow consistent results without polluting the nonzero bits cache. So this patch goes back to Roger's idea of just using a match iterator in the aarch64 backend (and presumably anywhere else we see this popping up). Bootstrapped and regression tested on aarch64-linux-gnu where it fixes bitint-args.c (as expected). PR target/115478 gcc/ * config/aarch64/iterators.md (any_or_plus): New code iterator. * config/aarch64/aarch64.md (extr<mode>5_insn): Use any_or_plus. (extr<mode>5_insn_alt, extrsi5_insn_uxtw): Likewise. (extrsi5_insn_uxtw_alt, extrsi5_insn_di): Likewise. gcc/testsuite/ * gcc.target/aarch64/bitint-args.c: Update expected output.
2025-02-11aarch64: Update fp8 dependenciesAndrew Carlotti1-5/+5
We agreed with LLVM developers to not enforce the architectural dependencies between fp8 multiplication features, and they have already been removed from LLVM and Binutils. Remove them from GCC as well. gcc/ChangeLog: * config/aarch64/aarch64-option-extensions.def (SSVE_FP8FMA): Adjust formatting. (FP8DOT4): Replace FP8FMA dependency with FP8. (SSVE_FP8DOT4): Replace SSVE_FP8FMA dependency with SME2+FP8. (FP8DOT2): Replace FP8DOT4 dependency with FP8. (SSVE_FP8DOT2): Replace SSVE_FP8DOT4 dependency with SME2+FP8. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pragma_cpp_predefs_4.c: Adjust expected defines. * gcc.target/aarch64/simd/vmla_lane_indices_1.c: Modify target pragmas. * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c: Ditto. * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c: Ditto. * gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Ditto. * gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Ditto.
2025-02-07aarch64: gimple fold aes[ed] [PR114522]Andrew Pinski1-0/+29
Instead of waiting to get combine/rtl optimizations fixed here. This fixes the builtins at the gimple level. It should provide for slightly faster compile time since we have a simplification earlier on. Built and tested for aarch64-linux-gnu. gcc/ChangeLog: PR target/114522 * config/aarch64/aarch64-builtins.cc (aarch64_fold_aes_op): New function. (aarch64_general_gimple_fold_builtin): Call aarch64_fold_aes_op for crypto_aese and crypto_aesd. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-02-07aarch64: Fix bootstrap with --enable-checking=release [PR118771]Andrew Pinski1-0/+3
With release checking we get an uninitialization warning inside aarch64_split_move because of jump threading for the case of `npieces==0` but `npieces` is never 0 (but there is no way the compiler can know that. So this fixes the issue by adding a `gcc_assert` to the function which asserts that `npieces > 0` and fixes the uninitialization warning. Bootstrapped and tested on aarch64-linux-gnu (with and without --enable-checking=release). The warning: aarch64.cc: In function 'void aarch64_split_move(rtx, rtx, machine_mode)': aarch64.cc:3418:31: error: '*(rtx_def**)((char*)&dst_pieces + offsetof(auto_vec<rtx_def*, 4>,auto_vec<rtx_def*, 4>::m_data[0]))' may be used uninitialized [-Werror=maybe-uninitialized] 3418 | if (reg_overlap_mentioned_p (dst_pieces[0], src)) | ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ aarch64.cc:3408:20: note: 'dst_pieces' declared here 3408 | auto_vec<rtx, 4> dst_pieces, src_pieces; | ^~~~~~~~~~ PR target/118771 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_split_move): Assert that npieces is greater than 0. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-02-05aarch64: Fix sve/acle/general/ldff1_8.c failuresRichard Sandiford1-1/+18
gcc.target/aarch64/sve/acle/general/ldff1_8.c and gcc.target/aarch64/sve/ptest_1.c were failing because the aarch64 port was giving a zero (unknown) cost to instructions that compute two results in parallel. This was latent until r15-1575-gea8061f46a30, which fixed rtl-ssa to treat zero costs as unknown. A long-standing todo here is to make insn_cost derive costs from md information, rather than having to write a lot of matching code in aarch64_rtx_costs. But that's not something we can do for GCC 15. This patch instead treats the cost of a PARALLEL as being the maximum cost of its constituent sets. I don't like this very much, since it isn't really target-specific behaviour. If it were stage 1, I'd be trying to change pattern_cost instead. gcc/ * config/aarch64/aarch64.cc (aarch64_insn_cost): Give PARALLELs the same cost as the costliest SET.
2025-02-03aarch64: Fix dupq_* testsuite failuresRichard Sandiford1-28/+82
This patch fixes the dupq_* testsuite failures. The tests were introduced with r15-3669-ga92f54f580c3 (which was a nice improvement) and Pengxuan originally had a follow-on patch to recognise INDEX constants during vec_init. I'd originally wanted to solve this a different way, using wildcards when building a vector and letting vector_builder::finalize find the best way of filling them in. I no longer think that's the best approach though. Stepped constants are likely to be more expensive than unstepped constants, so we should first try finding an unstepped constant that is valid, even if it has a longer representation than the stepped version. This patch therefore uses a variant of Pengxuan's idea. While there, I noticed that the (old) code for finding an unstepped constant only tried varying one bit at a time. So for index 0 in a 16-element constant, the code would try picking a constant from index 8, 4, 2, and then 1. But since the goal is to create "fewer, larger, repeating parts", it would be better to iterate over a bit-reversed increment, so that after trying an XOR with 0 and 8, we try adding 4 to each previous attempt, then 2 to each previous attempt, and so on. In the previous example this would give 8, 4, 12, 2, 10, 6, 14, ... The test shows an example of this for 8 shorts. gcc/ * config/aarch64/aarch64.cc (aarch64_choose_vector_init_constant): New function, split out from... (aarch64_expand_vector_init_fallback): ...here. Use a bit- reversed increment to find a constant index. Add support for stepped constants. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/dupq_12.c: New test.
2025-01-24aarch64: Add +cpa feature flagAndrew Carlotti2-1/+3
This doesn't enable anything within the compiler, but this allows the flag to be passed the assembler. There also doesn't appear to be a kernel cpuinfo name yet. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V9_5A): Add CPA. * config/aarch64/aarch64-option-extensions.def (CPA): New. * doc/invoke.texi: Document +cpa.
2025-01-24aarch64: Add command line support for armv9.5-aAndrew Carlotti1-0/+1
gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V9_5A): New. * doc/invoke.texi: Document armv9.5-a option. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/armv9p5.c: New test.
2025-01-24aarch64: Make AARCH64_FL_CRYPTO always unsetAndrew Carlotti1-4/+8
This feature flag bit only exists to support the +crypto alias. Outside of option processing this bit needs to be set or unset consistently. This patch goes with the latter option. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc: Assert that CRYPTO bit is not set. * config/aarch64/aarch64-feature-deps.h (info<FEAT>.explicit_on): Unset CRYPTO bit. (cpu_##CORE_IDENT): Ditto. gcc/testsuite/ChangeLog: * gcc.target/aarch64/crypto-alias-1.c: New test.
2025-01-24aarch64: Refactor aarch64_rewrite_mcpuAndrew Carlotti1-1/+0
Use aarch64_validate_cpu instead of the existing duplicate (and worse) version of the -mcpu parsing code. The original code used fatal_error; I'm guessing that using error instead should be ok. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (aarch64_rewrite_selected_cpu): Refactor and inline into... (aarch64_rewrite_mcpu): this. * config/aarch64/aarch64-protos.h (aarch64_rewrite_selected_cpu): Delete.
2025-01-24aarch64: Rewrite architecture strings for assemblerAndrew Carlotti4-26/+23
Add infrastructure to allow rewriting the architecture strings passed to the assembler (either as -march options or .arch directives). There was already canonicalisation everywhere except for an -march driver option passed directly to the compiler; this patch applies the same canonicalisation there as well. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (aarch64_get_arch_string_for_assembler): New. (aarch64_rewrite_march): New. (aarch64_rewrite_selected_cpu): Call new function. * config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity mapping. * config/aarch64/aarch64-protos.h (aarch64_get_arch_string_for_assembler): New. * config/aarch64/aarch64.cc (aarch64_declare_function_name): Call new function. (aarch64_start_file): Ditto. * config/aarch64/aarch64.h (EXTRA_SPEC_FUNCTIONS): Use new macro name. (MCPU_TO_MARCH_SPEC): Rename to... (MARCH_REWRITE_SPEC): ...this, and extend the spec rule. (aarch64_rewrite_march): New declaration. (MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to... (AARCH64_BASE_SPEC_FUNCTIONS): ...this, and add new function. (ASM_CPU_SPEC): Use new macro name.
2025-01-24aarch64: Move arch/cpu parsing to aarch64-common.ccAndrew Carlotti2-322/+18
Aside from moving the functions, the only changes are to make them non-static, and to use the existing info arrays within aarch64-common.cc instead of the info arrays remaining in aarch64.cc. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (aarch64_get_all_extension_candidates): Move within file. (aarch64_print_hint_candidates): Move from aarch64.cc. (aarch64_print_hint_for_extensions): Ditto. (aarch64_print_hint_for_arch): Ditto. (aarch64_print_hint_for_core): Ditto. (enum aarch_parse_opt_result): Ditto. (aarch64_parse_arch): Ditto. (aarch64_parse_cpu): Ditto. (aarch64_parse_tune): Ditto. (aarch64_validate_march): Ditto. (aarch64_validate_mcpu): Ditto. (aarch64_validate_mtune): Ditto. * config/aarch64/aarch64-protos.h (aarch64_rewrite_selected_cpu): Move within file. (aarch64_print_hint_for_extensions): Share function prototype. (aarch64_print_hint_for_arch): Ditto. (aarch64_print_hint_for_core): Ditto. (enum aarch_parse_opt_result): Ditto. (aarch64_validate_march): Ditto. (aarch64_validate_mcpu): Ditto. (aarch64_validate_mtune): Ditto. (aarch64_get_all_extension_candidates): Unshare prototype. * config/aarch64/aarch64.cc (aarch64_parse_arch): Move to aarch64-common.cc. (aarch64_parse_cpu): Ditto. (aarch64_parse_tune): Ditto. (aarch64_print_hint_candidates): Ditto. (aarch64_print_hint_for_core): Ditto. (aarch64_print_hint_for_arch): Ditto. (aarch64_print_hint_for_extensions): Ditto. (aarch64_validate_mcpu): Ditto. (aarch64_validate_march): Ditto. (aarch64_validate_mtune): Ditto.
2025-01-24aarch64: Inline aarch64_print_hint_for_core_or_archAndrew Carlotti1-28/+22
It seems odd that we add "native" to the list for -march but not for -mcpu. This is probably a bug, but for now we'll preserve the existing behaviour. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_print_hint_candidates): New helper function. (aarch64_print_hint_for_core_or_arch): Inline into callers. (aarch64_print_hint_for_core): Inline callee and use new helper. (aarch64_print_hint_for_arch): Ditto. (aarch64_print_hint_for_extensions): Use new helper.
2025-01-24aarch64: Adjust option parsing parameter types.Andrew Carlotti1-81/+93
Replace `const struct processor *` in output parameters with `aarch64_arch` or `aarch64_cpu`. Replace `std:string` parameter in aarch64_print_hint_for_extensions with `char *`. Also name the return parameters more clearly and consistently. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_print_hint_for_extensions): Receive string as a char *. (aarch64_parse_arch): Don't return a const struct processor *. (aarch64_parse_cpu): Ditto. (aarch64_parse_tune): Ditto. (aarch64_validate_mtune): Ditto. (aarch64_validate_mcpu): Ditto, and use temporary variables for march/mcpu cross-check. (aarch64_validate_march): Ditto. (aarch64_override_options): Adjust for changed parameter types. (aarch64_handle_attr_arch): Ditto. (aarch64_handle_attr_cpu): Ditto. (aarch64_handle_attr_tune): Ditto.
2025-01-24aarch64: Replace duplicate cpu enumsAndrew Carlotti4-22/+15
Replace `enum aarch64_processor` and `enum target_cpus` with `enum aarch64_cpu`, and prefix the entries with `AARCH64_CPU_`. Also rename aarch64_none to aarch64_no_cpu. gcc/ChangeLog: * config/aarch64/aarch64-opts.h (enum aarch64_processor): Rename to... (enum aarch64_cpu): ...this, and rename the entries. * config/aarch64/aarch64.cc (aarch64_type): Rename type and initial value. (struct processor): Rename member types. (all_architectures): Rename enum members. (all_cores): Ditto. (aarch64_get_tune_cpu): Rename type and enum member. * config/aarch64/aarch64.h (enum target_cpus): Remove. (TARGET_CPU_DEFAULT): Rename default value. (aarch64_tune): Rename type. * config/aarch64/aarch64.opt: (selected_tune): Rename type and default value.
2025-01-24aarch64: Improve mcpu/march conflict checkAndrew Carlotti1-5/+2
Features from a cpu or base architecture that were explicitly disabled by a +nofeat option were being incorrectly added back in before checking for conflicts between -mcpu and -march options. This patch instead compares the returned feature masks directly. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_override_options): Compare returned feature masks directly. gcc/testsuite/ChangeLog: * gcc.target/aarch64/target_attr_crypto_ice_1.c: Prune warning. * gcc.target/aarch64/target_attr_crypto_ice_2.c: Ditto.
2025-01-24c++/modules: Fix linkage checks for exported using-declsyxj-github-4371-0/+1
This patch attempts to fix an error when build module std. The reason for the error is __builtin_va_list (aka struct __va_list) has internal linkage. so mark this builtin type as TREE_PUBLIC to make struct __va_list has external linkage. g++ -fmodules -std=c++23 -fsearch-include-path bits/std.cc -c std.cc:3642:14:error: exporting ‘typedef __gnuc_va_list va_list’ that does not have external linkage 3642 | using std::va_list; | ^~~~~~~ <built-in>: note: ‘struct __va_list’ declared here with internal linkage gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_build_builtin_va_list): Mark __builtin_va_list as TREE_PUBLIC. * config/arm/arm.cc (arm_build_builtin_va_list): Likewise. gcc/testsuite/ChangeLog: * g++.dg/modules/builtin-8.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2025-01-24Fix command flags for SVE2 faminmaxSaurabh Jha3-6/+5
Earlier, we were gating SVE2 faminmax behind sve+faminmax. This was incorrect and this patch changes it so that it is gated behind sve2+faminmax. gcc/ChangeLog: * config/aarch64/aarch64-sve2.md: (*aarch64_pred_faminmax_fused): Fix to use the correct flags. * config/aarch64/aarch64.h (TARGET_SVE_FAMINMAX): Remove. * config/aarch64/iterators.md: Fix iterators so that famax and famin use correct flags. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/faminmax_1.c: Fix test to use the correct flags. * gcc.target/aarch64/sve/faminmax_2.c: Fix test to use the correct flags. * gcc.target/aarch64/sve/faminmax_3.c: New test.
2025-01-23aarch64: Avoid redundant writes to FPMRRichard Sandiford3-0/+59
GCC 15 is the first release to support FP8 intrinsics. The underlying instructions depend on the value of a new register, FPMR. Unlike FPCR, FPMR is a normal call-clobbered/caller-save register rather than a global register. So: - The FP8 intrinsics take a final uint64_t argument that specifies what value FPMR should have. - If an FP8 operation is split across multiple functions, it is likely that those functions would have a similar argument. If the object code has the structure: for (...) fp8_kernel (..., fpmr_value); then fp8_kernel would set FPMR to fpmr_value each time it is called, even though FPMR will already have that value for at least the second and subsequent calls (and possibly the first). The working assumption for the ABI has been that writes to registers like FPMR can in general be more expensive than reads and so it would be better to use a conditional write like: mrs tmp, fpmr cmp tmp, <value> beq 1f msr fpmr, <value> 1: instead of writing the same value to FPMR repeatedly. This patch implements that. It also adds a tuning flag that suppresses the behaviour, both to make testing easier and to support any future cores that (for example) are able to rename FPMR. Hopefully this really is the last part of the FP8 enablement. gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag. * config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro. * config/aarch64/aarch64.md: Split moves into FPMR into a test and branch around. (aarch64_write_fpmr): New pattern. gcc/testsuite/ * g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add cheap_fpmr_write by default. * gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise. * gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write. * gcc.target/aarch64/acle/fpmr-2.c: Likewise. * gcc.target/aarch64/simd/vcvt_fpm.c: Likewise. * gcc.target/aarch64/simd/vdot2_fpm.c: Likewise. * gcc.target/aarch64/simd/vdot4_fpm.c: Likewise. * gcc.target/aarch64/simd/vmla_fpm.c: Likewise. * gcc.target/aarch64/acle/fpmr-6.c: New test.
2025-01-23aarch64: Fix memory cost for FPM_REGNUMRichard Sandiford1-2/+9
GCC 15 is going to be the first release to support FPMR. While working on a follow-up patch, I noticed that for: (set (reg:DI R) ...) ... (set (reg:DI fpmr) (reg:DI R)) IRA would prefer to spill R to memory rather than allocate a GPR. This is because the register move cost for GENERAL_REGS to MOVEABLE_SYSREGS is very high: /* Moves to/from sysregs are expensive, and must go via GPR. */ if (from == MOVEABLE_SYSREGS) return 80 + aarch64_register_move_cost (mode, GENERAL_REGS, to); if (to == MOVEABLE_SYSREGS) return 80 + aarch64_register_move_cost (mode, from, GENERAL_REGS); but the memory cost for MOVEABLE_SYSREGS was the same as for GENERAL_REGS, making memory much cheaper. Loading and storing FPMR involves a GPR temporary, so the cost should account for moving into and out of that temporary. This did show up indirectly in some of the existing asm tests, where the stack frame allocated 16 bytes for callee saves (D8) and another 16 bytes for spilling a temporary register. It's possible that other registers need the same treatment and it's more than probable that this code needs a rework. None of that seems suitable for stage 4 though. gcc/ * config/aarch64/aarch64.cc (aarch64_memory_move_cost): Account for the cost of moving in and out of GENERAL_SYSREGS. gcc/testsuite/ * gcc.target/aarch64/acle/fpmr-5.c: New test. * gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Don't expect a spill slot to be allocated. * gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.
2025-01-23aarch64: Allow FPMR source values to be zeroRichard Sandiford1-25/+25
GCC 15 is going to be the first release to support FPMR. The alternatives for moving values into FPMR were missing a zero alternative, meaning that moves of zero would use an unnecessary temporary register. gcc/ * config/aarch64/aarch64.md (*mov<SHORT:mode>_aarch64) (*movsi_aarch64, *movdi_aarch64): Allow the source of an MSR to be zero. gcc/testsuite/ * gcc.target/aarch64/acle/fp8.c: Add tests for moving zero into FPMR.
2025-01-22aarch64: Fix aarch64_write_sysregdi predicateRichard Sandiford1-1/+1
While working on another MSR-related patch, I noticed that aarch64_write_sysregdi's constraints allowed zero, but its predicate didn't. This could in principle lead to an ICE during or after RA, since "Z" allows the RA to rematerialise a known zero directly into the instruction. The usual techniques for exposing a bug like that didn't work in this case, since the optimisers seem to make no attempt to remove redundant zero moves (at least not for these unspec_volatiles). But the problem still seems worth fixing pre-emptively. gcc/ * config/aarch64/aarch64.md (aarch64_read_sysregti): Change the source predicate to aarch64_reg_or_zero. gcc/testsuite/ * gcc.target/aarch64/acle/rwsr-4.c: New test. * gcc.target/aarch64/acle/rwsr-armv8p9.c: Avoid read of uninitialized variable.
2025-01-21Regenerate aarch64.opt.urlsAlfie Richards1-0/+3
This updates aarch64.opt.urls after my patch earlier today. Pushing directly as it's an obvious fix. gcc/ChangeLog: * config/aarch64/aarch64.opt.urls: Regenerate
2025-01-21AArch64: Add LUTI ACLE for SVE2Vladimir Miloserdov9-1/+125
This patch introduces support for LUTI2/LUTI4 ACLE for SVE2. LUTI instructions are used for efficient table lookups with 2-bit or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from the low 128 bits of the table vector using packed 2-bit indices, while LUTI4 can read from the low 128 or 256 bits of the table vector or from two table vectors using packed 4-bit indices. These instructions fill the destination vector by copying elements indexed by segments of the source vector, selected by the vector segment index. The changes include the addition of a new AArch64 option extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions for the new LUTI instruction shapes, and implementations of the svluti2 and svluti4 builtins. gcc/ChangeLog: * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Add new flag TARGET_LUT. * config/aarch64/aarch64-sve-builtins-shapes.cc (struct luti_base): Shape for lut intrinsics. (SHAPE): Specializations for lut shapes for luti2 and luti4.. * config/aarch64/aarch64-sve-builtins-shapes.h: Declare lut intrinsics. * config/aarch64/aarch64-sve-builtins-sve2.cc (class svluti_lane_impl): Define expand for lut intrinsics. (FUNCTION): Define expand for lut intrinsics. * config/aarch64/aarch64-sve-builtins-sve2.def (REQUIRED_EXTENSIONS): Declare lut intrinsics behind lut flag. (svluti2_lane): Define intrinsic behind flag. (svluti4_lane): Define intrinsic behind flag. * config/aarch64/aarch64-sve-builtins-sve2.h: Declare lut intrinsics. * config/aarch64/aarch64-sve-builtins.cc (TYPES_bh_data): New type for byte and halfword. (bh_data): Type array for byte and halfword. (h_data): Type array for halfword. * config/aarch64/aarch64-sve2.md (@aarch64_sve_luti<LUTI_BITS><mode>): Instruction patterns for lut intrinsics. * config/aarch64/iterators.md: Iterators and attributes for lut intrinsics. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: New test macro. * lib/target-supports.exp: Add lut flag to the for loop. * gcc.target/aarch64/sve/acle/general-c/lut_1.c: New test. * gcc.target/aarch64/sve/acle/general-c/lut_2.c: New test. * gcc.target/aarch64/sve/acle/general-c/lut_3.c: New test. * gcc.target/aarch64/sve/acle/general-c/lut_4.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti2_f16.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti2_s16.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti2_s8.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti2_u16.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti2_u8.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti4_f16.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti4_s16.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti4_s8.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti4_u16.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c: New test. * gcc.target/aarch64/sve2/acle/asm/luti4_u8.c: New test.
2025-01-21Add warning for non-spec compliant FMV in Aarch64Alfie Richards2-0/+13
This patch adds a warning when FMV is used for Aarch64. The reasoning for this is the ACLE [1] spec for FMV has diverged significantly from the current implementation and we want to prevent potential future compatability issues. There is a patch for an ACLE compliant version of target_version and target_clone in progress but it won't make gcc-15. This has been bootstrap and regression tested for Aarch64. Is this okay for master and packport to gcc-14? [1] https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_process_target_version_attr): Add experimental warning. * config/aarch64/aarch64.opt: Add command line option to disable warning. * doc/invoke.texi: Add documentation for -W[no-]experimental-fmv-target. gcc/testsuite/ChangeLog: * g++.target/aarch64/mv-1.C: Add CLI flag. * g++.target/aarch64/mv-symbols1.C: Add CLI flag. * g++.target/aarch64/mv-symbols2.C: Add CLI flag. * g++.target/aarch64/mv-symbols3.C: Add CLI flag. * g++.target/aarch64/mv-symbols4.C: Add CLI flag. * g++.target/aarch64/mv-symbols5.C: Add CLI flag. * g++.target/aarch64/mv-warning1.C: New test. * g++.target/aarch64/mvc-symbols1.C: Add CLI flag. * g++.target/aarch64/mvc-symbols2.C: Add CLI flag. * g++.target/aarch64/mvc-symbols3.C: Add CLI flag. * g++.target/aarch64/mvc-symbols4.C: Add CLI flag. * g++.target/aarch64/mv-pragma.C: Add CLI flag. * g++.target/aarch64/mvc-warning1.C: New test.
2025-01-20aarch64: Fix invalid subregs in xorsign [PR118501]Richard Sandiford1-2/+2
In the testcase, we try to use xorsign on: (subreg:DF (reg:TI R) 8) i.e. the highpart of the TI. xorsign wants to take a V2DF paradoxical subreg of this, which is rightly rejected as a direct operation. In cases like this, we need to force the highpart into a fresh register first. gcc/ PR target/118501 * config/aarch64/aarch64.md (@xorsign<mode>3): Use force_lowpart_subreg. gcc/testsuite/ PR target/118501 * gcc.c-torture/compile/pr118501.c: New test.
2025-01-20aarch64: Add missing simd requirements for INS [PR118531]Richard Sandiford1-3/+6
In g:b096a6ebe9d9f9fed4c105f6555f724eb32af95c I'd forgotten to gate some uses of INS on TARGET_SIMD. gcc/ PR target/118531 * config/aarch64/aarch64.md (*insv_reg<mode>_<SUBDI_BITS>) (*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>) (*aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>): Add missing simd requirements. gcc/testsuite/ * gcc.target/aarch64/ins_bitfield_1a.c: New test. * gcc.target/aarch64/ins_bitfield_3a.c: Likewise. * gcc.target/aarch64/ins_bitfield_5a.c: Likewise.
2025-01-18AArch64: Use standard names for saturating arithmeticAkram Ahmad5-56/+271
This renames the existing {s,u}q{add,sub} instructions to use the standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and IFN_SAT_SUB. The NEON intrinsics for saturating arithmetic and their corresponding builtins are changed to use these standard names too. Using the standard names for the instructions causes 32 and 64-bit unsigned scalar saturating arithmetic to use the NEON instructions, resulting in an additional (and inefficient) FMOV to be generated when the original operands are in GP registers. This patch therefore also restores the original behaviour of using the adds/subs instructions in this circumstance. Additional tests are written for the scalar and Adv. SIMD cases to ensure that the correct instructions are used. The NEON intrinsics are already tested elsewhere. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc: Expand iterators. * config/aarch64/aarch64-simd-builtins.def: Use standard names * config/aarch64/aarch64-simd.md: Use standard names, split insn definitions on signedness of operator and type of operands. * config/aarch64/arm_neon.h: Use standard builtin names. * config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to simplify splitting of insn for unsigned scalar arithmetic. gcc/testsuite/ChangeLog: * gcc.target/aarch64/scalar_intrinsics.c: Update testcases. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc: Template file for unsigned vector saturating arithmetic tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c: 8-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c: 16-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c: 32-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c: 64-bit vector type tests. * gcc.target/aarch64/saturating_arithmetic.inc: Template file for scalar saturating arithmetic tests. * gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests. * gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests. * gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests. * gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests. Co-authored-by: Tamar Christina <tamar.christina@arm.com>
2025-01-18AArch64: Use standard names for SVE saturating arithmeticAkram Ahmad1-2/+2
Rename the existing SVE unpredicated saturating arithmetic instructions to use standard names which are used by IFN_SAT_ADD and IFN_SAT_SUB. gcc/ChangeLog: * config/aarch64/aarch64-sve.md: Rename insns gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/saturating_arithmetic.inc: Template file for auto-vectorizer tests. * gcc.target/aarch64/sve/saturating_arithmetic_1.c: Instantiate 8-bit vector tests. * gcc.target/aarch64/sve/saturating_arithmetic_2.c: Instantiate 16-bit vector tests. * gcc.target/aarch64/sve/saturating_arithmetic_3.c: Instantiate 32-bit vector tests. * gcc.target/aarch64/sve/saturating_arithmetic_4.c: Instantiate 64-bit vector tests.
2025-01-18Revert "AArch64: Use standard names for saturating arithmetic"Tamar Christina5-271/+56
This reverts commit 5f5833a4107ddfbcd87651bf140151de043f4c36.
2025-01-18Revert "AArch64: Use standard names for SVE saturating arithmetic"Tamar Christina1-2/+2
This reverts commit 26b2d9f27ca24f0705641a85f29d179fa0600869.
2025-01-17AArch64: Use standard names for SVE saturating arithmeticTamar Christina1-2/+2
Rename the existing SVE unpredicated saturating arithmetic instructions to use standard names which are used by IFN_SAT_ADD and IFN_SAT_SUB. gcc/ChangeLog: * config/aarch64/aarch64-sve.md: Rename insns gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/saturating_arithmetic.inc: Template file for auto-vectorizer tests. * gcc.target/aarch64/sve/saturating_arithmetic_1.c: Instantiate 8-bit vector tests. * gcc.target/aarch64/sve/saturating_arithmetic_2.c: Instantiate 16-bit vector tests. * gcc.target/aarch64/sve/saturating_arithmetic_3.c: Instantiate 32-bit vector tests. * gcc.target/aarch64/sve/saturating_arithmetic_4.c: Instantiate 64-bit vector tests.
2025-01-17AArch64: Use standard names for saturating arithmeticTamar Christina5-56/+271
This renames the existing {s,u}q{add,sub} instructions to use the standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and IFN_SAT_SUB. The NEON intrinsics for saturating arithmetic and their corresponding builtins are changed to use these standard names too. Using the standard names for the instructions causes 32 and 64-bit unsigned scalar saturating arithmetic to use the NEON instructions, resulting in an additional (and inefficient) FMOV to be generated when the original operands are in GP registers. This patch therefore also restores the original behaviour of using the adds/subs instructions in this circumstance. Additional tests are written for the scalar and Adv. SIMD cases to ensure that the correct instructions are used. The NEON intrinsics are already tested elsewhere. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc: Expand iterators. * config/aarch64/aarch64-simd-builtins.def: Use standard names * config/aarch64/aarch64-simd.md: Use standard names, split insn definitions on signedness of operator and type of operands. * config/aarch64/arm_neon.h: Use standard builtin names. * config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to simplify splitting of insn for unsigned scalar arithmetic. gcc/testsuite/ChangeLog: * gcc.target/aarch64/scalar_intrinsics.c: Update testcases. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc: Template file for unsigned vector saturating arithmetic tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c: 8-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c: 16-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c: 32-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c: 64-bit vector type tests. * gcc.target/aarch64/saturating_arithmetic.inc: Template file for scalar saturating arithmetic tests. * gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests. * gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests. * gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests. * gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests. Co-authored-by: Tamar Christina <tamar.christina@arm.com>
2025-01-16AArch64: have -mcpu=native detect architecture extensions for unknown ↵Tamar Christina1-14/+38
non-homogenous systems [PR113257] in g:e91a17fe39c39e98cebe6e1cbc8064ee6846a3a7 we added the ability for -mcpu=native on unknown CPUs to still enable architecture extensions. This has worked great but was only added for homogenous systems. However the same thing works for big.LITTLE as in such system the cores must have the same extensions otherwise it doesn't fundamentally work. i.e. task migration from one core to the other wouldn't work. This extends the same handling to non-homogenous systems. gcc/ChangeLog: PR target/113257 * config/aarch64/driver-aarch64.cc (get_cpu_from_id, DEFAULT_CPU): New. (host_detect_local_cpu): Use it. gcc/testsuite/ChangeLog: PR target/113257 * gcc.target/aarch64/cpunative/info_34: New test. * gcc.target/aarch64/cpunative/native_cpu_34.c: New test. * gcc.target/aarch64/cpunative/info_35: New test. * gcc.target/aarch64/cpunative/native_cpu_35.c: New test. Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
2025-01-16AArch64: don't override march to assembler with mcpu if march is specified ↵Tamar Christina1-1/+1
[PR110901] When both -mcpu and -march are specified, the value of -march wins out. This is done correctly for the calls to cc1 and for the assembler directives we put out in assembly files. However in the call to as we don't do this and instead use the arch from the cpu. This leads to a situation that GCC cannot reliably be used to compile assembly files which don't have a .arch directive. This is quite common with .S files which use macros to selectively enable codepath based on what the preprocessor sees. The fix is to change MCPU_TO_MARCH_SPEC to not override the march if an march is already specified. gcc/ChangeLog: PR target/110901 * config/aarch64/aarch64.h (MCPU_TO_MARCH_SPEC): Don't override if march is set. gcc/testsuite/ChangeLog: PR target/110901 * gcc.target/aarch64/options_set_29.c: New test.
2025-01-15AArch64: Update neoverse512tvb tuningWilco Dijkstra1-2/+4
Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW. gcc: * config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.
2025-01-15AArch64: Add FULLY_PIPELINED_FMA to tune baselineWilco Dijkstra3-5/+4
Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is already enabled for some cores, but benchmarking it shows it is faster on all modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1). gcc: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE): Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA. * config/aarch64/tuning_models/ampere1b.h: Remove redundant AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA. * config/aarch64/tuning_models/neoversev2.h: Likewise.
2025-01-15AArch64: Deprecate -mabi=ilp32Wilco Dijkstra1-0/+2
ILP32 was originally intended to make porting to AArch64 easier. Support was never merged in the Linux kernel or GLIBC, so it has been unsupported for many years. There isn't a benefit in keeping unsupported features forever, so deprecate it now (and it could be removed in a future release). gcc: * config/aarch64/aarch64.cc (aarch64_override_options): Add warning. * doc/invoke.texi: Document -mabi=ilp32 as deprecated. gcc/testsuite: * gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated. * gcc.target/aarch64/pr100518.c: Likewise. * gcc.target/aarch64/pr113114.c: Likewise. * gcc.target/aarch64/pr80295.c: Likewise. * gcc.target/aarch64/pr94201.c: Likewise. * gcc.target/aarch64/pr94577.c: Likewise. * gcc.target/aarch64/sve/pr108603.c: Likewise.
2025-01-10AArch64: correct Cortex-X4 MIDRTamar Christina1-1/+1
The Parts Num field for the MIDR for Cortex-X4 is wrong. It's currently the parts number for a Cortex-A720 (which does have the right number). The correct number can be found in the Cortex-X4 Technical Reference Manual [1] on page 382 in Issue Number 5. [1] https://developer.arm.com/documentation/102484/latest/ gcc/ChangeLog: * config/aarch64/aarch64-cores.def (AARCH64_CORE): Fix cortex-x4 parts num.
2025-01-10Add new hardreg PRE passAndrew Carlotti1-0/+4
This pass is used to optimise assignments to the FPMR register in aarch64. I chose to implement this as a middle-end pass because it mostly reuses the existing RTL PRE code within gcse.cc. Compared to RTL PRE, the key difference in this new pass is that we insert new writes directly to the destination hardreg, instead of writing to a new pseudo-register and copying the result later. This requires changes to the analysis portion of the pass, because sets cannot be moved before existing instructions that set, use or clobber the hardreg, and the value becomes unavailable after any uses of clobbers of the hardreg. Any uses of the hardreg in debug insns will be deleted. We could do better than this, but for the aarch64 fpmr I don't think we emit useful debuginfo for deleted fp8 instructions anyway (and I don't even know if it's possible to have a debug fpmr use when entering hardreg PRE). gcc/ChangeLog: * config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro. * gcse.cc (doing_hardreg_pre_p): New global variable. (do_load_motion): New boolean check. (current_hardreg_regno): New global variable. (compute_local_properties): Unset transp for hardreg clobbers. (prune_hardreg_uses): New function. (want_to_gcse_p): Use different checks for hardreg PRE. (oprs_unchanged_p): Disable load motion for hardreg PRE pass. (hash_scan_set): For hardreg PRE, skip non-hardreg sets and check for hardreg clobbers. (record_last_mem_set_info): Skip for hardreg PRE. (compute_pre_data): Prune hardreg uses from transp bitmap. (pre_expr_reaches_here_p_work): Add sentence to comment. (insert_insn_start_basic_block): New functions. (pre_edge_insert): Don't add hardreg sets to predecessor block. (pre_delete): Use hardreg for the reaching reg. (reset_hardreg_debug_uses): New function. (pre_gcse): For hardreg PRE, reset debug uses and don't insert copies. (one_pre_gcse_pass): Disable load motion for hardreg PRE. (execute_hardreg_pre): New. (class pass_hardreg_pre): New. (pass_hardreg_pre::gate): New. (make_pass_hardreg_pre): New. * passes.def (pass_hardreg_pre): New pass. * tree-pass.h (make_pass_hardreg_pre): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/fpmr-1.c: New test. * gcc.target/aarch64/acle/fpmr-2.c: New test. * gcc.target/aarch64/acle/fpmr-3.c: New test. * gcc.target/aarch64/acle/fpmr-4.c: New test.
2025-01-10aarch64: Add new +xs flagAndrew Carlotti2-1/+3
GCC does not emit tlbi instructions, so this only affects the flags passed through to the assembler. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_7A): Add XS. * config/aarch64/aarch64-option-extensions.def (XS): New flag.
2025-01-10aarch64: Add new +wfxt flagAndrew Carlotti2-1/+3
GCC does not currently emit the wfet or wfit instructions, so this primarily affects the flags passed through to the assembler. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_7A): Add WFXT. * config/aarch64/aarch64-option-extensions.def (WFXT): New flag.
2025-01-10aarch64: Add new +rcpc2 flagAndrew Carlotti3-3/+5
gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_4A): Add RCPC2. * config/aarch64/aarch64-option-extensions.def (RCPC2): New flag. (RCPC3): Add RCPC2 dependency. * config/aarch64/aarch64.h (TARGET_RCPC2): Use new flag. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_21.c: Add rcpc2 to expected feature string instead of rcpc. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
2025-01-10aarch64: Add new +flagm2 flagAndrew Carlotti2-1/+3
GCC does not currently emit the axflag or xaflag instructions, so this primarily affects the flags passed through to the assembler. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_5A): Add FLAGM2. * config/aarch64/aarch64-option-extensions.def (FLAGM2): New flag. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_21.c: Add flagm2 to expected feature string instead of flagm. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
2025-01-10aarch64: Add new +frintts flagAndrew Carlotti5-4/+6
gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_5A): Add FRINTTS * config/aarch64/aarch64-option-extensions.def (FRINTTS): New flag. * config/aarch64/aarch64.h (TARGET_FRINT): Use new flag. * config/aarch64/arm_acle.h: Use new flag for frintts intrinsics. * config/aarch64/arm_neon.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_21.c: Add frintts to expected feature string. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
2025-01-10aarch64: Add new +jscvt flagAndrew Carlotti4-3/+5
gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_3A): Add JSCVT. * config/aarch64/aarch64-option-extensions.def (JSCVT): New flag. * config/aarch64/aarch64.h (TARGET_JSCVT): Use new flag. * config/aarch64/arm_acle.h: Use new flag for jscvt intrinsics. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_21.c: Add jscvt to expected feature string. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
2025-01-10aarch64: Add new +fcma flagAndrew Carlotti4-4/+6
This includes +fcma as a dependency of +sve, and means that we can finally support fcma intrinsics on a64fx. Also add fcma to the Features list in several cpunative testcases that incorrectly included sve without fcma. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_3A): Add FCMA. * config/aarch64/aarch64-option-extensions.def (FCMA): New flag. (SVE): Add FCMA dependency. * config/aarch64/aarch64.h (TARGET_COMPLEX): Use new flag. * config/aarch64/arm_neon.h: Use new flag for fcma intrinsics. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/info_15: Add fcma to Features. * gcc.target/aarch64/cpunative/info_16: Ditto. * gcc.target/aarch64/cpunative/info_17: Ditto. * gcc.target/aarch64/cpunative/info_8: Ditto. * gcc.target/aarch64/cpunative/info_9: Ditto.
2025-01-10aarch64: Use PAUTH instead of V8_3A in some placesAndrew Carlotti2-7/+7
gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_expand_epilogue): Use TARGET_PAUTH. * config/aarch64/aarch64.md: Update comment.