Age | Commit message (Collapse) | Author | Files | Lines |
|
While looking at PR118956, I noticed that we had some dead code
left over after the removal of the vcond patterns. The can_invert_p
path is no longer used.
gcc/
* config/aarch64/aarch64-protos.h (aarch64_expand_sve_vec_cmp_float):
Remove can_invert_p argument and change return type to void.
* config/aarch64/aarch64.cc (aarch64_expand_sve_vec_cmp_float):
Likewise.
* config/aarch64/aarch64-sve.md (vec_cmp<mode><vpred>): Update call
accordingly.
|
|
generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses
generic_prefetch_tune in generic_armv8_a_tunings.
This patch updates the pointer to generic_armv8_a_prefetch_tune.
This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
* config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch
struct pointer.
|
|
bits in common
So the change to prefer ADD over IOR for combining two objects with no bits in
common is (IMHO) generally good. It has some minor fallout.
In particular the aarch64 port (and I suspect others) have patterns that
recognize IOR, but not PLUS or XOR for these cases and thus tests which
expected to optimize with IOR are no longer optimizing.
Roger suggested using a code iterator for this purpose. Richard S. suggested a
new match operator to cover those cases.
I really like the match operator idea, but as Richard S. notes in the PR it
would require either not validating the "no bits in common", which dramatically
reduces the utility IMHO or we'd need some work to allow consistent results
without polluting the nonzero bits cache.
So this patch goes back to Roger's idea of just using a match iterator in the
aarch64 backend (and presumably anywhere else we see this popping up).
Bootstrapped and regression tested on aarch64-linux-gnu where it fixes
bitint-args.c (as expected).
PR target/115478
gcc/
* config/aarch64/iterators.md (any_or_plus): New code iterator.
* config/aarch64/aarch64.md (extr<mode>5_insn): Use any_or_plus.
(extr<mode>5_insn_alt, extrsi5_insn_uxtw): Likewise.
(extrsi5_insn_uxtw_alt, extrsi5_insn_di): Likewise.
gcc/testsuite/
* gcc.target/aarch64/bitint-args.c: Update expected output.
|
|
We agreed with LLVM developers to not enforce the architectural
dependencies between fp8 multiplication features, and they have already
been removed from LLVM and Binutils. Remove them from GCC as well.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def
(SSVE_FP8FMA): Adjust formatting.
(FP8DOT4): Replace FP8FMA dependency with FP8.
(SSVE_FP8DOT4): Replace SSVE_FP8FMA dependency with SME2+FP8.
(FP8DOT2): Replace FP8DOT4 dependency with FP8.
(SSVE_FP8DOT2): Replace SSVE_FP8DOT4 dependency with SME2+FP8.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Adjust expected
defines.
* gcc.target/aarch64/simd/vmla_lane_indices_1.c: Modify target
pragmas.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c:
Ditto.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Ditto.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Ditto.
* gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Ditto.
|
|
Instead of waiting to get combine/rtl optimizations fixed here. This fixes the
builtins at the gimple level. It should provide for slightly faster compile time
since we have a simplification earlier on.
Built and tested for aarch64-linux-gnu.
gcc/ChangeLog:
PR target/114522
* config/aarch64/aarch64-builtins.cc (aarch64_fold_aes_op): New function.
(aarch64_general_gimple_fold_builtin): Call aarch64_fold_aes_op for crypto_aese
and crypto_aesd.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
With release checking we get an uninitialization warning
inside aarch64_split_move because of jump threading for the case of `npieces==0`
but `npieces` is never 0 (but there is no way the compiler can know that.
So this fixes the issue by adding a `gcc_assert` to the function which asserts
that `npieces > 0` and fixes the uninitialization warning.
Bootstrapped and tested on aarch64-linux-gnu (with and without --enable-checking=release).
The warning:
aarch64.cc: In function 'void aarch64_split_move(rtx, rtx, machine_mode)':
aarch64.cc:3418:31: error: '*(rtx_def**)((char*)&dst_pieces + offsetof(auto_vec<rtx_def*, 4>,auto_vec<rtx_def*, 4>::m_data[0]))' may be used uninitialized [-Werror=maybe-uninitialized]
3418 | if (reg_overlap_mentioned_p (dst_pieces[0], src))
| ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
aarch64.cc:3408:20: note: 'dst_pieces' declared here
3408 | auto_vec<rtx, 4> dst_pieces, src_pieces;
| ^~~~~~~~~~
PR target/118771
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_split_move): Assert that npieces is
greater than 0.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
gcc.target/aarch64/sve/acle/general/ldff1_8.c and
gcc.target/aarch64/sve/ptest_1.c were failing because the
aarch64 port was giving a zero (unknown) cost to instructions
that compute two results in parallel. This was latent until
r15-1575-gea8061f46a30, which fixed rtl-ssa to treat zero costs
as unknown.
A long-standing todo here is to make insn_cost derive costs from md
information, rather than having to write a lot of matching code in
aarch64_rtx_costs. But that's not something we can do for GCC 15.
This patch instead treats the cost of a PARALLEL as being the maximum
cost of its constituent sets. I don't like this very much, since it
isn't really target-specific behaviour. If it were stage 1, I'd be
trying to change pattern_cost instead.
gcc/
* config/aarch64/aarch64.cc (aarch64_insn_cost): Give PARALLELs
the same cost as the costliest SET.
|
|
This patch fixes the dupq_* testsuite failures. The tests were
introduced with r15-3669-ga92f54f580c3 (which was a nice improvement)
and Pengxuan originally had a follow-on patch to recognise INDEX
constants during vec_init.
I'd originally wanted to solve this a different way, using wildcards
when building a vector and letting vector_builder::finalize find the
best way of filling them in. I no longer think that's the best
approach though. Stepped constants are likely to be more expensive
than unstepped constants, so we should first try finding an unstepped
constant that is valid, even if it has a longer representation than
the stepped version.
This patch therefore uses a variant of Pengxuan's idea.
While there, I noticed that the (old) code for finding an unstepped
constant only tried varying one bit at a time. So for index 0 in a
16-element constant, the code would try picking a constant from index 8,
4, 2, and then 1. But since the goal is to create "fewer, larger,
repeating parts", it would be better to iterate over a bit-reversed
increment, so that after trying an XOR with 0 and 8, we try adding 4
to each previous attempt, then 2 to each previous attempt, and so on.
In the previous example this would give 8, 4, 12, 2, 10, 6, 14, ...
The test shows an example of this for 8 shorts.
gcc/
* config/aarch64/aarch64.cc (aarch64_choose_vector_init_constant):
New function, split out from...
(aarch64_expand_vector_init_fallback): ...here. Use a bit-
reversed increment to find a constant index. Add support for
stepped constants.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/dupq_12.c: New test.
|
|
This doesn't enable anything within the compiler, but this allows the
flag to be passed the assembler. There also doesn't appear to be a
kernel cpuinfo name yet.
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V9_5A): Add CPA.
* config/aarch64/aarch64-option-extensions.def (CPA): New.
* doc/invoke.texi: Document +cpa.
|
|
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V9_5A): New.
* doc/invoke.texi: Document armv9.5-a option.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/armv9p5.c: New test.
|
|
This feature flag bit only exists to support the +crypto alias. Outside
of option processing this bit needs to be set or unset consistently.
This patch goes with the latter option.
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc: Assert that CRYPTO
bit is not set.
* config/aarch64/aarch64-feature-deps.h
(info<FEAT>.explicit_on): Unset CRYPTO bit.
(cpu_##CORE_IDENT): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/crypto-alias-1.c: New test.
|
|
Use aarch64_validate_cpu instead of the existing duplicate (and worse)
version of the -mcpu parsing code.
The original code used fatal_error; I'm guessing that using error
instead should be ok.
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc
(aarch64_rewrite_selected_cpu): Refactor and inline into...
(aarch64_rewrite_mcpu): this.
* config/aarch64/aarch64-protos.h
(aarch64_rewrite_selected_cpu): Delete.
|
|
Add infrastructure to allow rewriting the architecture strings passed to
the assembler (either as -march options or .arch directives). There was
already canonicalisation everywhere except for an -march driver option
passed directly to the compiler; this patch applies the same
canonicalisation there as well.
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc
(aarch64_get_arch_string_for_assembler): New.
(aarch64_rewrite_march): New.
(aarch64_rewrite_selected_cpu): Call new function.
* config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity mapping.
* config/aarch64/aarch64-protos.h
(aarch64_get_arch_string_for_assembler): New.
* config/aarch64/aarch64.cc
(aarch64_declare_function_name): Call new function.
(aarch64_start_file): Ditto.
* config/aarch64/aarch64.h
(EXTRA_SPEC_FUNCTIONS): Use new macro name.
(MCPU_TO_MARCH_SPEC): Rename to...
(MARCH_REWRITE_SPEC): ...this, and extend the spec rule.
(aarch64_rewrite_march): New declaration.
(MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to...
(AARCH64_BASE_SPEC_FUNCTIONS): ...this, and add new function.
(ASM_CPU_SPEC): Use new macro name.
|
|
Aside from moving the functions, the only changes are to make them
non-static, and to use the existing info arrays within aarch64-common.cc
instead of the info arrays remaining in aarch64.cc.
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc
(aarch64_get_all_extension_candidates): Move within file.
(aarch64_print_hint_candidates): Move from aarch64.cc.
(aarch64_print_hint_for_extensions): Ditto.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_core): Ditto.
(enum aarch_parse_opt_result): Ditto.
(aarch64_parse_arch): Ditto.
(aarch64_parse_cpu): Ditto.
(aarch64_parse_tune): Ditto.
(aarch64_validate_march): Ditto.
(aarch64_validate_mcpu): Ditto.
(aarch64_validate_mtune): Ditto.
* config/aarch64/aarch64-protos.h
(aarch64_rewrite_selected_cpu): Move within file.
(aarch64_print_hint_for_extensions): Share function prototype.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_core): Ditto.
(enum aarch_parse_opt_result): Ditto.
(aarch64_validate_march): Ditto.
(aarch64_validate_mcpu): Ditto.
(aarch64_validate_mtune): Ditto.
(aarch64_get_all_extension_candidates): Unshare prototype.
* config/aarch64/aarch64.cc
(aarch64_parse_arch): Move to aarch64-common.cc.
(aarch64_parse_cpu): Ditto.
(aarch64_parse_tune): Ditto.
(aarch64_print_hint_candidates): Ditto.
(aarch64_print_hint_for_core): Ditto.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_extensions): Ditto.
(aarch64_validate_mcpu): Ditto.
(aarch64_validate_march): Ditto.
(aarch64_validate_mtune): Ditto.
|
|
It seems odd that we add "native" to the list for -march but not for
-mcpu. This is probably a bug, but for now we'll preserve the existing
behaviour.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_print_hint_candidates): New helper function.
(aarch64_print_hint_for_core_or_arch): Inline into callers.
(aarch64_print_hint_for_core): Inline callee and use new helper.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_extensions): Use new helper.
|
|
Replace `const struct processor *` in output parameters with
`aarch64_arch` or `aarch64_cpu`.
Replace `std:string` parameter in aarch64_print_hint_for_extensions with
`char *`.
Also name the return parameters more clearly and consistently.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_print_hint_for_extensions): Receive string as a char *.
(aarch64_parse_arch): Don't return a const struct processor *.
(aarch64_parse_cpu): Ditto.
(aarch64_parse_tune): Ditto.
(aarch64_validate_mtune): Ditto.
(aarch64_validate_mcpu): Ditto, and use temporary variables for
march/mcpu cross-check.
(aarch64_validate_march): Ditto.
(aarch64_override_options): Adjust for changed parameter types.
(aarch64_handle_attr_arch): Ditto.
(aarch64_handle_attr_cpu): Ditto.
(aarch64_handle_attr_tune): Ditto.
|
|
Replace `enum aarch64_processor` and `enum target_cpus` with
`enum aarch64_cpu`, and prefix the entries with `AARCH64_CPU_`.
Also rename aarch64_none to aarch64_no_cpu.
gcc/ChangeLog:
* config/aarch64/aarch64-opts.h
(enum aarch64_processor): Rename to...
(enum aarch64_cpu): ...this, and rename the entries.
* config/aarch64/aarch64.cc
(aarch64_type): Rename type and initial value.
(struct processor): Rename member types.
(all_architectures): Rename enum members.
(all_cores): Ditto.
(aarch64_get_tune_cpu): Rename type and enum member.
* config/aarch64/aarch64.h (enum target_cpus): Remove.
(TARGET_CPU_DEFAULT): Rename default value.
(aarch64_tune): Rename type.
* config/aarch64/aarch64.opt:
(selected_tune): Rename type and default value.
|
|
Features from a cpu or base architecture that were explicitly disabled
by a +nofeat option were being incorrectly added back in before checking
for conflicts between -mcpu and -march options. This patch instead
compares the returned feature masks directly.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_override_options): Compare
returned feature masks directly.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/target_attr_crypto_ice_1.c: Prune warning.
* gcc.target/aarch64/target_attr_crypto_ice_2.c: Ditto.
|
|
This patch attempts to fix an error when build module std. The reason for
the error is __builtin_va_list (aka struct __va_list) has internal linkage.
so mark this builtin type as TREE_PUBLIC to make struct __va_list has
external linkage.
g++ -fmodules -std=c++23 -fsearch-include-path bits/std.cc -c
std.cc:3642:14:error: exporting ‘typedef __gnuc_va_list va_list’ that does not have external linkage
3642 | using std::va_list;
| ^~~~~~~
<built-in>: note: ‘struct __va_list’ declared here with internal linkage
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_build_builtin_va_list): Mark
__builtin_va_list as TREE_PUBLIC.
* config/arm/arm.cc (arm_build_builtin_va_list): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/modules/builtin-8.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
Earlier, we were gating SVE2 faminmax behind sve+faminmax. This was
incorrect and this patch changes it so that it is gated behind
sve2+faminmax.
gcc/ChangeLog:
* config/aarch64/aarch64-sve2.md:
(*aarch64_pred_faminmax_fused): Fix to use the correct flags.
* config/aarch64/aarch64.h
(TARGET_SVE_FAMINMAX): Remove.
* config/aarch64/iterators.md: Fix iterators so that famax and
famin use correct flags.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/faminmax_1.c: Fix test to use the
correct flags.
* gcc.target/aarch64/sve/faminmax_2.c: Fix test to use the
correct flags.
* gcc.target/aarch64/sve/faminmax_3.c: New test.
|
|
GCC 15 is the first release to support FP8 intrinsics.
The underlying instructions depend on the value of a new register,
FPMR. Unlike FPCR, FPMR is a normal call-clobbered/caller-save
register rather than a global register. So:
- The FP8 intrinsics take a final uint64_t argument that
specifies what value FPMR should have.
- If an FP8 operation is split across multiple functions,
it is likely that those functions would have a similar argument.
If the object code has the structure:
for (...)
fp8_kernel (..., fpmr_value);
then fp8_kernel would set FPMR to fpmr_value each time it is
called, even though FPMR will already have that value for at
least the second and subsequent calls (and possibly the first).
The working assumption for the ABI has been that writes to
registers like FPMR can in general be more expensive than
reads and so it would be better to use a conditional write like:
mrs tmp, fpmr
cmp tmp, <value>
beq 1f
msr fpmr, <value>
1:
instead of writing the same value to FPMR repeatedly.
This patch implements that. It also adds a tuning flag that suppresses
the behaviour, both to make testing easier and to support any future
cores that (for example) are able to rename FPMR.
Hopefully this really is the last part of the FP8 enablement.
gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag.
* config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro.
* config/aarch64/aarch64.md: Split moves into FPMR into a test
and branch around.
(aarch64_write_fpmr): New pattern.
gcc/testsuite/
* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add
cheap_fpmr_write by default.
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
* gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write.
* gcc.target/aarch64/acle/fpmr-2.c: Likewise.
* gcc.target/aarch64/simd/vcvt_fpm.c: Likewise.
* gcc.target/aarch64/simd/vdot2_fpm.c: Likewise.
* gcc.target/aarch64/simd/vdot4_fpm.c: Likewise.
* gcc.target/aarch64/simd/vmla_fpm.c: Likewise.
* gcc.target/aarch64/acle/fpmr-6.c: New test.
|
|
GCC 15 is going to be the first release to support FPMR.
While working on a follow-up patch, I noticed that for:
(set (reg:DI R) ...)
...
(set (reg:DI fpmr) (reg:DI R))
IRA would prefer to spill R to memory rather than allocate a GPR.
This is because the register move cost for GENERAL_REGS to
MOVEABLE_SYSREGS is very high:
/* Moves to/from sysregs are expensive, and must go via GPR. */
if (from == MOVEABLE_SYSREGS)
return 80 + aarch64_register_move_cost (mode, GENERAL_REGS, to);
if (to == MOVEABLE_SYSREGS)
return 80 + aarch64_register_move_cost (mode, from, GENERAL_REGS);
but the memory cost for MOVEABLE_SYSREGS was the same as for
GENERAL_REGS, making memory much cheaper.
Loading and storing FPMR involves a GPR temporary, so the cost should
account for moving into and out of that temporary.
This did show up indirectly in some of the existing asm tests,
where the stack frame allocated 16 bytes for callee saves (D8)
and another 16 bytes for spilling a temporary register.
It's possible that other registers need the same treatment
and it's more than probable that this code needs a rework.
None of that seems suitable for stage 4 though.
gcc/
* config/aarch64/aarch64.cc (aarch64_memory_move_cost): Account
for the cost of moving in and out of GENERAL_SYSREGS.
gcc/testsuite/
* gcc.target/aarch64/acle/fpmr-5.c: New test.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Don't expect
a spill slot to be allocated.
* gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.
|
|
GCC 15 is going to be the first release to support FPMR.
The alternatives for moving values into FPMR were missing
a zero alternative, meaning that moves of zero would use an
unnecessary temporary register.
gcc/
* config/aarch64/aarch64.md (*mov<SHORT:mode>_aarch64)
(*movsi_aarch64, *movdi_aarch64): Allow the source of an MSR
to be zero.
gcc/testsuite/
* gcc.target/aarch64/acle/fp8.c: Add tests for moving zero into FPMR.
|
|
While working on another MSR-related patch, I noticed that
aarch64_write_sysregdi's constraints allowed zero, but its
predicate didn't. This could in principle lead to an ICE
during or after RA, since "Z" allows the RA to rematerialise
a known zero directly into the instruction.
The usual techniques for exposing a bug like that didn't work in this
case, since the optimisers seem to make no attempt to remove redundant
zero moves (at least not for these unspec_volatiles). But the problem
still seems worth fixing pre-emptively.
gcc/
* config/aarch64/aarch64.md (aarch64_read_sysregti): Change
the source predicate to aarch64_reg_or_zero.
gcc/testsuite/
* gcc.target/aarch64/acle/rwsr-4.c: New test.
* gcc.target/aarch64/acle/rwsr-armv8p9.c: Avoid read of uninitialized
variable.
|
|
This updates aarch64.opt.urls after my patch earlier today.
Pushing directly as it's an obvious fix.
gcc/ChangeLog:
* config/aarch64/aarch64.opt.urls: Regenerate
|
|
This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.
LUTI instructions are used for efficient table lookups with 2-bit
or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from
the low 128 bits of the table vector using packed 2-bit indices,
while LUTI4 can read from the low 128 or 256 bits of the table
vector or from two table vectors using packed 4-bit indices.
These instructions fill the destination vector by copying elements
indexed by segments of the source vector, selected by the vector
segment index.
The changes include the addition of a new AArch64 option
extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions
for the new LUTI instruction shapes, and implementations of the
svluti2 and svluti4 builtins.
gcc/ChangeLog:
* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): Add new flag TARGET_LUT.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(struct luti_base): Shape for lut intrinsics.
(SHAPE): Specializations for lut shapes for luti2 and luti4..
* config/aarch64/aarch64-sve-builtins-shapes.h: Declare lut
intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(class svluti_lane_impl): Define expand for lut intrinsics.
(FUNCTION): Define expand for lut intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.def
(REQUIRED_EXTENSIONS): Declare lut intrinsics behind lut flag.
(svluti2_lane): Define intrinsic behind flag.
(svluti4_lane): Define intrinsic behind flag.
* config/aarch64/aarch64-sve-builtins-sve2.h: Declare lut
intrinsics.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_bh_data): New type for byte and halfword.
(bh_data): Type array for byte and halfword.
(h_data): Type array for halfword.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_luti<LUTI_BITS><mode>): Instruction patterns for
lut intrinsics.
* config/aarch64/iterators.md: Iterators and attributes for lut
intrinsics.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: New test
macro.
* lib/target-supports.exp: Add lut flag to the for loop.
* gcc.target/aarch64/sve/acle/general-c/lut_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_2.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_3.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_4.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u8.c: New test.
|
|
This patch adds a warning when FMV is used for Aarch64.
The reasoning for this is the ACLE [1] spec for FMV has diverged
significantly from the current implementation and we want to prevent
potential future compatability issues.
There is a patch for an ACLE compliant version of target_version and
target_clone in progress but it won't make gcc-15.
This has been bootstrap and regression tested for Aarch64.
Is this okay for master and packport to gcc-14?
[1] https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_process_target_version_attr): Add experimental warning.
* config/aarch64/aarch64.opt: Add command line option to disable
warning.
* doc/invoke.texi: Add documentation for -W[no-]experimental-fmv-target.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/mv-1.C: Add CLI flag.
* g++.target/aarch64/mv-symbols1.C: Add CLI flag.
* g++.target/aarch64/mv-symbols2.C: Add CLI flag.
* g++.target/aarch64/mv-symbols3.C: Add CLI flag.
* g++.target/aarch64/mv-symbols4.C: Add CLI flag.
* g++.target/aarch64/mv-symbols5.C: Add CLI flag.
* g++.target/aarch64/mv-warning1.C: New test.
* g++.target/aarch64/mvc-symbols1.C: Add CLI flag.
* g++.target/aarch64/mvc-symbols2.C: Add CLI flag.
* g++.target/aarch64/mvc-symbols3.C: Add CLI flag.
* g++.target/aarch64/mvc-symbols4.C: Add CLI flag.
* g++.target/aarch64/mv-pragma.C: Add CLI flag.
* g++.target/aarch64/mvc-warning1.C: New test.
|
|
In the testcase, we try to use xorsign on:
(subreg:DF (reg:TI R) 8)
i.e. the highpart of the TI. xorsign wants to take a V2DF
paradoxical subreg of this, which is rightly rejected as a direct
operation. In cases like this, we need to force the highpart into
a fresh register first.
gcc/
PR target/118501
* config/aarch64/aarch64.md (@xorsign<mode>3): Use
force_lowpart_subreg.
gcc/testsuite/
PR target/118501
* gcc.c-torture/compile/pr118501.c: New test.
|
|
In g:b096a6ebe9d9f9fed4c105f6555f724eb32af95c I'd forgotten
to gate some uses of INS on TARGET_SIMD.
gcc/
PR target/118531
* config/aarch64/aarch64.md (*insv_reg<mode>_<SUBDI_BITS>)
(*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>)
(*aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>): Add missing
simd requirements.
gcc/testsuite/
* gcc.target/aarch64/ins_bitfield_1a.c: New test.
* gcc.target/aarch64/ins_bitfield_3a.c: Likewise.
* gcc.target/aarch64/ins_bitfield_5a.c: Likewise.
|
|
This renames the existing {s,u}q{add,sub} instructions to use the
standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
IFN_SAT_SUB.
The NEON intrinsics for saturating arithmetic and their corresponding
builtins are changed to use these standard names too.
Using the standard names for the instructions causes 32 and 64-bit
unsigned scalar saturating arithmetic to use the NEON instructions,
resulting in an additional (and inefficient) FMOV to be generated when
the original operands are in GP registers. This patch therefore also
restores the original behaviour of using the adds/subs instructions
in this circumstance.
Additional tests are written for the scalar and Adv. SIMD cases to
ensure that the correct instructions are used. The NEON intrinsics are
already tested elsewhere.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for unsigned scalar arithmetic.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/scalar_intrinsics.c: Update testcases.
* gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
Template file for unsigned vector saturating arithmetic tests.
* gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
8-bit vector type tests.
* gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
16-bit vector type tests.
* gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
32-bit vector type tests.
* gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
64-bit vector type tests.
* gcc.target/aarch64/saturating_arithmetic.inc: Template file
for scalar saturating arithmetic tests.
* gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
* gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
* gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
* gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
Co-authored-by: Tamar Christina <tamar.christina@arm.com>
|
|
Rename the existing SVE unpredicated saturating arithmetic instructions
to use standard names which are used by IFN_SAT_ADD and IFN_SAT_SUB.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md: Rename insns
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/saturating_arithmetic.inc:
Template file for auto-vectorizer tests.
* gcc.target/aarch64/sve/saturating_arithmetic_1.c:
Instantiate 8-bit vector tests.
* gcc.target/aarch64/sve/saturating_arithmetic_2.c:
Instantiate 16-bit vector tests.
* gcc.target/aarch64/sve/saturating_arithmetic_3.c:
Instantiate 32-bit vector tests.
* gcc.target/aarch64/sve/saturating_arithmetic_4.c:
Instantiate 64-bit vector tests.
|
|
This reverts commit 5f5833a4107ddfbcd87651bf140151de043f4c36.
|
|
This reverts commit 26b2d9f27ca24f0705641a85f29d179fa0600869.
|
|
Rename the existing SVE unpredicated saturating arithmetic instructions
to use standard names which are used by IFN_SAT_ADD and IFN_SAT_SUB.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md: Rename insns
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/saturating_arithmetic.inc:
Template file for auto-vectorizer tests.
* gcc.target/aarch64/sve/saturating_arithmetic_1.c:
Instantiate 8-bit vector tests.
* gcc.target/aarch64/sve/saturating_arithmetic_2.c:
Instantiate 16-bit vector tests.
* gcc.target/aarch64/sve/saturating_arithmetic_3.c:
Instantiate 32-bit vector tests.
* gcc.target/aarch64/sve/saturating_arithmetic_4.c:
Instantiate 64-bit vector tests.
|
|
This renames the existing {s,u}q{add,sub} instructions to use the
standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
IFN_SAT_SUB.
The NEON intrinsics for saturating arithmetic and their corresponding
builtins are changed to use these standard names too.
Using the standard names for the instructions causes 32 and 64-bit
unsigned scalar saturating arithmetic to use the NEON instructions,
resulting in an additional (and inefficient) FMOV to be generated when
the original operands are in GP registers. This patch therefore also
restores the original behaviour of using the adds/subs instructions
in this circumstance.
Additional tests are written for the scalar and Adv. SIMD cases to
ensure that the correct instructions are used. The NEON intrinsics are
already tested elsewhere.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for unsigned scalar arithmetic.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/scalar_intrinsics.c: Update testcases.
* gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
Template file for unsigned vector saturating arithmetic tests.
* gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
8-bit vector type tests.
* gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
16-bit vector type tests.
* gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
32-bit vector type tests.
* gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
64-bit vector type tests.
* gcc.target/aarch64/saturating_arithmetic.inc: Template file
for scalar saturating arithmetic tests.
* gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
* gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
* gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
* gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
Co-authored-by: Tamar Christina <tamar.christina@arm.com>
|
|
non-homogenous systems [PR113257]
in g:e91a17fe39c39e98cebe6e1cbc8064ee6846a3a7 we added the ability for
-mcpu=native on unknown CPUs to still enable architecture extensions.
This has worked great but was only added for homogenous systems.
However the same thing works for big.LITTLE as in such system the cores must
have the same extensions otherwise it doesn't fundamentally work.
i.e. task migration from one core to the other wouldn't work.
This extends the same handling to non-homogenous systems.
gcc/ChangeLog:
PR target/113257
* config/aarch64/driver-aarch64.cc (get_cpu_from_id, DEFAULT_CPU): New.
(host_detect_local_cpu): Use it.
gcc/testsuite/ChangeLog:
PR target/113257
* gcc.target/aarch64/cpunative/info_34: New test.
* gcc.target/aarch64/cpunative/native_cpu_34.c: New test.
* gcc.target/aarch64/cpunative/info_35: New test.
* gcc.target/aarch64/cpunative/native_cpu_35.c: New test.
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
|
|
[PR110901]
When both -mcpu and -march are specified, the value of -march wins out.
This is done correctly for the calls to cc1 and for the assembler directives we
put out in assembly files.
However in the call to as we don't do this and instead use the arch from the
cpu. This leads to a situation that GCC cannot reliably be used to compile
assembly files which don't have a .arch directive.
This is quite common with .S files which use macros to selectively enable
codepath based on what the preprocessor sees.
The fix is to change MCPU_TO_MARCH_SPEC to not override the march if an march
is already specified.
gcc/ChangeLog:
PR target/110901
* config/aarch64/aarch64.h (MCPU_TO_MARCH_SPEC): Don't override if
march is set.
gcc/testsuite/ChangeLog:
PR target/110901
* gcc.target/aarch64/options_set_29.c: New test.
|
|
Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the
missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
gcc:
* config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.
|
|
Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is
already enabled for some cores, but benchmarking it shows it is faster on all
modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1).
gcc:
* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE):
Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
* config/aarch64/tuning_models/ampere1b.h: Remove redundant
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
* config/aarch64/tuning_models/neoversev2.h: Likewise.
|
|
ILP32 was originally intended to make porting to AArch64 easier. Support was
never merged in the Linux kernel or GLIBC, so it has been unsupported for many
years. There isn't a benefit in keeping unsupported features forever, so
deprecate it now (and it could be removed in a future release).
gcc:
* config/aarch64/aarch64.cc (aarch64_override_options): Add warning.
* doc/invoke.texi: Document -mabi=ilp32 as deprecated.
gcc/testsuite:
* gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated.
* gcc.target/aarch64/pr100518.c: Likewise.
* gcc.target/aarch64/pr113114.c: Likewise.
* gcc.target/aarch64/pr80295.c: Likewise.
* gcc.target/aarch64/pr94201.c: Likewise.
* gcc.target/aarch64/pr94577.c: Likewise.
* gcc.target/aarch64/sve/pr108603.c: Likewise.
|
|
The Parts Num field for the MIDR for Cortex-X4 is wrong. It's currently the
parts number for a Cortex-A720 (which does have the right number).
The correct number can be found in the Cortex-X4 Technical Reference Manual [1]
on page 382 in Issue Number 5.
[1] https://developer.arm.com/documentation/102484/latest/
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Fix cortex-x4 parts
num.
|
|
This pass is used to optimise assignments to the FPMR register in
aarch64. I chose to implement this as a middle-end pass because it
mostly reuses the existing RTL PRE code within gcse.cc.
Compared to RTL PRE, the key difference in this new pass is that we
insert new writes directly to the destination hardreg, instead of
writing to a new pseudo-register and copying the result later. This
requires changes to the analysis portion of the pass, because sets
cannot be moved before existing instructions that set, use or clobber
the hardreg, and the value becomes unavailable after any uses of
clobbers of the hardreg.
Any uses of the hardreg in debug insns will be deleted. We could do
better than this, but for the aarch64 fpmr I don't think we emit useful
debuginfo for deleted fp8 instructions anyway (and I don't even know if
it's possible to have a debug fpmr use when entering hardreg PRE).
gcc/ChangeLog:
* config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
* gcse.cc (doing_hardreg_pre_p): New global variable.
(do_load_motion): New boolean check.
(current_hardreg_regno): New global variable.
(compute_local_properties): Unset transp for hardreg clobbers.
(prune_hardreg_uses): New function.
(want_to_gcse_p): Use different checks for hardreg PRE.
(oprs_unchanged_p): Disable load motion for hardreg PRE pass.
(hash_scan_set): For hardreg PRE, skip non-hardreg sets and
check for hardreg clobbers.
(record_last_mem_set_info): Skip for hardreg PRE.
(compute_pre_data): Prune hardreg uses from transp bitmap.
(pre_expr_reaches_here_p_work): Add sentence to comment.
(insert_insn_start_basic_block): New functions.
(pre_edge_insert): Don't add hardreg sets to predecessor block.
(pre_delete): Use hardreg for the reaching reg.
(reset_hardreg_debug_uses): New function.
(pre_gcse): For hardreg PRE, reset debug uses and don't insert
copies.
(one_pre_gcse_pass): Disable load motion for hardreg PRE.
(execute_hardreg_pre): New.
(class pass_hardreg_pre): New.
(pass_hardreg_pre::gate): New.
(make_pass_hardreg_pre): New.
* passes.def (pass_hardreg_pre): New pass.
* tree-pass.h (make_pass_hardreg_pre): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/fpmr-1.c: New test.
* gcc.target/aarch64/acle/fpmr-2.c: New test.
* gcc.target/aarch64/acle/fpmr-3.c: New test.
* gcc.target/aarch64/acle/fpmr-4.c: New test.
|
|
GCC does not emit tlbi instructions, so this only affects the flags
passed through to the assembler.
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V8_7A): Add XS.
* config/aarch64/aarch64-option-extensions.def (XS): New flag.
|
|
GCC does not currently emit the wfet or wfit instructions, so this
primarily affects the flags passed through to the assembler.
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V8_7A): Add WFXT.
* config/aarch64/aarch64-option-extensions.def (WFXT): New flag.
|
|
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V8_4A): Add RCPC2.
* config/aarch64/aarch64-option-extensions.def
(RCPC2): New flag.
(RCPC3): Add RCPC2 dependency.
* config/aarch64/aarch64.h (TARGET_RCPC2): Use new flag.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/native_cpu_21.c: Add rcpc2 to
expected feature string instead of rcpc.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
|
|
GCC does not currently emit the axflag or xaflag instructions, so this
primarily affects the flags passed through to the assembler.
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V8_5A): Add FLAGM2.
* config/aarch64/aarch64-option-extensions.def (FLAGM2): New flag.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/native_cpu_21.c: Add flagm2 to
expected feature string instead of flagm.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
|
|
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V8_5A): Add FRINTTS
* config/aarch64/aarch64-option-extensions.def (FRINTTS): New flag.
* config/aarch64/aarch64.h (TARGET_FRINT): Use new flag.
* config/aarch64/arm_acle.h: Use new flag for frintts intrinsics.
* config/aarch64/arm_neon.h: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/native_cpu_21.c: Add frintts to
expected feature string.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
|
|
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V8_3A): Add JSCVT.
* config/aarch64/aarch64-option-extensions.def (JSCVT): New flag.
* config/aarch64/aarch64.h (TARGET_JSCVT): Use new flag.
* config/aarch64/arm_acle.h: Use new flag for jscvt intrinsics.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/native_cpu_21.c: Add jscvt to
expected feature string.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
|
|
This includes +fcma as a dependency of +sve, and means that we can
finally support fcma intrinsics on a64fx.
Also add fcma to the Features list in several cpunative testcases that
incorrectly included sve without fcma.
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V8_3A): Add FCMA.
* config/aarch64/aarch64-option-extensions.def (FCMA): New flag.
(SVE): Add FCMA dependency.
* config/aarch64/aarch64.h (TARGET_COMPLEX): Use new flag.
* config/aarch64/arm_neon.h: Use new flag for fcma intrinsics.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/info_15: Add fcma to Features.
* gcc.target/aarch64/cpunative/info_16: Ditto.
* gcc.target/aarch64/cpunative/info_17: Ditto.
* gcc.target/aarch64/cpunative/info_8: Ditto.
* gcc.target/aarch64/cpunative/info_9: Ditto.
|
|
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_expand_epilogue): Use TARGET_PAUTH.
* config/aarch64/aarch64.md: Update comment.
|