Age | Commit message (Collapse) | Author | Files | Lines |
|
The following patch fixes UBs in the compiler when negativing
a CONST_INT containing HOST_WIDE_INT_MIN. I've changed the spots where
there wasn't an obvious earlier condition check or predicate that
would fail for such CONST_INTs.
2021-04-27 Jakub Jelinek <jakub@redhat.com>
PR target/100200
* config/aarch64/predicates.md (aarch64_sub_immediate,
aarch64_plus_immediate): Use -UINTVAL instead of -INTVAL.
* config/aarch64/aarch64.md (casesi, rotl<mode>3): Likewise.
* config/aarch64/aarch64.c (aarch64_print_operand,
aarch64_split_atomic_op, aarch64_expand_subvti): Likewise.
|
|
In this bug combine forms the (R)SHRN(2) instructions with an invalid shift amount.
The intrinsic expanders for these patterns validate the right shift amount but if the
final patterns end up being matched by combine (or other RTL passes I suppose) they
still let the wrong const_vector through.
This patch tightens up the predicates for the instructions involved by using predicates
for the right shift amount const_vectors.
gcc/ChangeLog:
PR target/99437
* config/aarch64/predicates.md (aarch64_simd_shift_imm_vec_qi): Define.
(aarch64_simd_shift_imm_vec_hi): Likewise.
(aarch64_simd_shift_imm_vec_si): Likewise.
(aarch64_simd_shift_imm_vec_di): Likewise.
* config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le): Use
predicate from above.
(aarch64_shrn<mode>_insn_be): Likewise.
(aarch64_rshrn<mode>_insn_le): Likewise.
(aarch64_rshrn<mode>_insn_be): Likewise.
(aarch64_shrn2<mode>_insn_le): Likewise.
(aarch64_shrn2<mode>_insn_be): Likewise.
(aarch64_rshrn2<mode>_insn_le): Likewise.
(aarch64_rshrn2<mode>_insn_be): Likewise.
gcc/testsuite/ChangeLog:
PR target/99437
* gcc.target/aarch64/simd/pr99437.c: New test.
|
|
|
|
The type attribute "alu_shfit_imm" is subdivided into
"alu_shift_imm_lsl_1to4" and "alu_shift_imm_other", to accommodate
optimazations of some microarchitectures.
Here is the detailed discussion.
https://gcc.gnu.org/pipermail/gcc/2020-September/233594.html
gcc/
* config/arm/types.md (define_attr "autodetect_type"): New.
(define_attr "type"): Subdivide alu_shift_imm.
* config/arm/common.md: New file.
* config/aarch64/predicates.md:Include common.md.
* config/arm/predicates.md:Include common.md.
* config/aarch64/aarch64.md (*add_<shift>_<mode>): Set autodetect_type.
(*add_<shift>_si_uxtw): Likewise.
(*sub_<shift>_<mode>): Likewise.
(*sub_<shift>_si_uxtw): Likewise.
(*neg_<shift>_<mode>2): Likewise.
(*neg_<shift>_si2_uxtw): Likewise.
* config/arm/arm.md (*addsi3_carryin_shift): Likewise.
(add_not_shift_cin): Likewise.
(*subsi3_carryin_shift): Likewise.
(*subsi3_carryin_shift_alt): Likewise.
(*rsbsi3_carryin_shift): Likewise.
(*rsbsi3_carryin_shift_alt): Likewise.
(*arm_shiftsi3): Likewise.
(*<arith_shift_insn>_multsi): Likewise.
(*<arith_shift_insn>_shiftsi): Likewise.
(subsi3_carryin): Set new type.
(*if_arith_move): Set new type.
(*if_move_arith): Set new type.
(define_attr "core_cycles"): Use new type.
* config/arm/arm-fixed.md (arm_ssatsihi_shift): Set autodetect_type.
* config/arm/thumb2.md (*orsi_not_shiftsi_si): Likewise.
(*thumb2_shiftsi3_short): Set new type.
* config/aarch64/falkor.md (falkor_alu_1_xyz): Use new type.
* config/aarch64/saphira.md (saphira_alu_1_xyz): Likewise.
* config/aarch64/thunderx.md (thunderx_arith_shift): Likewise.
* config/aarch64/thunderx2t99.md (thunderx2t99_alu_shift): Likewise.
* config/aarch64/thunderx3t110.md (thunderx3t110_alu_shift): Likewise.
(thunderx3t110_alu_shift1): Likewise.
* config/aarch64/tsv110.md (tsv110_alu_shift): Likewise.
* config/arm/arm1020e.md (1020alu_shift_op): Likewise.
* config/arm/arm1026ejs.md (alu_shift_op): Likewise.
* config/arm/arm1136jfs.md (11_alu_shift_op): Likewise.
* config/arm/arm926ejs.md (9_alu_op): Likewise.
* config/arm/cortex-a15.md (cortex_a15_alu_shift): Likewise.
* config/arm/cortex-a17.md (cortex_a17_alu_shiftimm): Likewise.
* config/arm/cortex-a5.md (cortex_a5_alu_shift): Likewise.
* config/arm/cortex-a53.md (cortex_a53_alu_shift): Likewise.
* config/arm/cortex-a57.md (cortex_a57_alu_shift): Likewise.
* config/arm/cortex-a7.md (cortex_a7_alu_shift): Likewise.
* config/arm/cortex-a8.md (cortex_a8_alu_shift): Likewise.
* config/arm/cortex-a9.md (cortex_a9_dp_shift): Likewise.
* config/arm/cortex-m4.md (cortex_m4_alu): Likewise.
* config/arm/cortex-m7.md (cortex_m7_alu_shift): Likewise.
* config/arm/cortex-r4.md (cortex_r4_alu_shift): Likewise.
* config/arm/exynos-m1.md (exynos_m1_alu_shift): Likewise.
* config/arm/fa526.md (526_alu_shift_op): Likewise.
* config/arm/fa606te.md (606te_alu_op): Likewise.
* config/arm/fa626te.md (626te_alu_shift_op): Likewise.
* config/arm/fa726te.md (726te_alu_shift_op): Likewise.
* config/arm/fmp626.md (mp626_alu_shift_op): Likewise.
* config/arm/marvell-pj4.md (pj4_shift): Likewise.
(pj4_shift_conds): Likewise.
(pj4_alu_shift): Likewise.
(pj4_alu_shift_conds): Likewise.
* config/arm/xgene1.md (xgene1_alu): Likewise.
* config/arm/arm.c (xscale_sched_adjust_cost): Likewise.
|
|
Following on from the previous commit to fix up the syntax for
add/sub/adds/subs and friends with a sign/zero-extended operand, this
patch removes the "mult" variants of these patterns which are all
redundant.
This patch removes the following patterns from the AArch64 backend:
*adds_mul_imm_<mode>
*subs_mul_imm_<mode>
*adds_<optab><mode>_multp2
*subs_<optab><mode>_multp2
*add_mul_imm_<mode>
*add_<optab><ALLX:mode>_mult_<GPI:mode>
*add_<optab><SHORT:mode>_mult_si_uxtw
*add_<optab><mode>_multp2
*add_<optab>si_multp2_uxtw
*add_uxt<mode>_multp2
*add_uxtsi_multp2_uxtw
*sub_mul_imm_<mode>
*sub_mul_imm_si_uxtw
*sub_<optab><mode>_multp2
*sub_<optab>si_multp2_uxtw
*sub_uxt<mode>_multp2
*sub_uxtsi_multp2_uxtw
*neg_mul_imm_<mode>2
*neg_mul_imm_si2_uxtw
Together with the following predicates which were used only by these
patterns:
aarch64_pwr_imm3
aarch64_pwr_2_si
aarch64_pwr_2_di
These patterns are all redundant since multiplications by powers of two
should be represented as shfits outside a (mem).
---
gcc/ChangeLog:
* config/aarch64/aarch64.md (*adds_mul_imm_<mode>): Delete.
(*subs_mul_imm_<mode>): Delete.
(*adds_<optab><mode>_multp2): Delete.
(*subs_<optab><mode>_multp2): Delete.
(*add_mul_imm_<mode>): Delete.
(*add_<optab><ALLX:mode>_mult_<GPI:mode>): Delete.
(*add_<optab><SHORT:mode>_mult_si_uxtw): Delete.
(*add_<optab><mode>_multp2): Delete.
(*add_<optab>si_multp2_uxtw): Delete.
(*add_uxt<mode>_multp2): Delete.
(*add_uxtsi_multp2_uxtw): Delete.
(*sub_mul_imm_<mode>): Delete.
(*sub_mul_imm_si_uxtw): Delete.
(*sub_<optab><mode>_multp2): Delete.
(*sub_<optab>si_multp2_uxtw): Delete.
(*sub_uxt<mode>_multp2): Delete.
(*sub_uxtsi_multp2_uxtw): Delete.
(*neg_mul_imm_<mode>2): Delete.
(*neg_mul_imm_si2_uxtw): Delete.
* config/aarch64/predicates.md (aarch64_pwr_imm3): Delete.
(aarch64_pwr_2_si): Delete.
(aarch64_pwr_2_di): Delete.
|
|
This patch introduces the mitigation for Straight Line Speculation past
the BLR instruction.
This mitigation replaces BLR instructions with a BL to a stub which uses
a BR to jump to the original value. These function stubs are then
appended with a speculation barrier to ensure no straight line
speculation happens after these jumps.
When optimising for speed we use a set of stubs for each function since
this should help the branch predictor make more accurate predictions
about where a stub should branch.
When optimising for size we use one set of stubs for all functions.
This set of stubs can have human readable names, and we are using
`__call_indirect_x<N>` for register x<N>.
When BTI branch protection is enabled the BLR instruction can jump to a
`BTI c` instruction using any register, while the BR instruction can
only jump to a `BTI c` instruction using the x16 or x17 registers.
Hence, in order to ensure this transformation is safe we mov the value
of the original register into x16 and use x16 for the BR.
As an example when optimising for size:
a
BLR x0
instruction would get transformed to something like
BL __call_indirect_x0
where __call_indirect_x0 labels a thunk that contains
__call_indirect_x0:
MOV X16, X0
BR X16
<speculation barrier>
The first version of this patch used local symbols specific to a
compilation unit to try and avoid relocations.
This was mistaken since functions coming from the same compilation unit
can still be in different sections, and the assembler will insert
relocations at jumps between sections.
On any relocation the linker is permitted to emit a veneer to handle
jumps between symbols that are very far apart. The registers x16 and
x17 may be clobbered by these veneers.
Hence the function stubs cannot rely on the values of x16 and x17 being
the same as just before the function stub is called.
Similar can be said for the hot/cold partitioning of single functions,
so function-local stubs have the same restriction.
This updated version of the patch never emits function stubs for x16 and
x17, and instead forces other registers to be used.
Given the above, there is now no benefit to local symbols (since they
are not enough to avoid dealing with linker intricacies). This patch
now uses global symbols with hidden visibility each stored in their own
COMDAT section. This means stubs can be shared between compilation
units while still avoiding the PLT indirection.
This patch also removes the `__call_indirect_x30` stub (and
function-local equivalent) which would simply jump back to the original
location.
The function-local stubs are emitted to the assembly output file in one
chunk, which means we need not add the speculation barrier directly
after each one.
This is because we know for certain that the instructions directly after
the BR in all but the last function stub will be from another one of
these stubs and hence will not contain a speculation gadget.
Instead we add a speculation barrier at the end of the sequence of
stubs.
The global stubs are emitted in COMDAT/.linkonce sections by
themselves so that the linker can remove duplicates from multiple object
files. This means they are not emitted in one chunk, and each one must
include the speculation barrier.
Another difference is that since the global stubs are shared across
compilation units we do not know that all functions will be targeting an
architecture supporting the SB instruction.
Rather than provide multiple stubs for each architecture, we provide a
stub that will work for all architectures -- using the DSB+ISB barrier.
This mitigation does not apply for BLR instructions in the following
places:
- Some accesses to thread-local variables use a code sequence with a BLR
instruction. This code sequence is part of the binary interface between
compiler and linker. If this BLR instruction needs to be mitigated, it'd
probably be best to do so in the linker. It seems that the code sequence
for thread-local variable access is unlikely to lead to a Spectre Revalation
Gadget.
- PLT stubs are produced by the linker and each contain a BLR instruction.
It seems that at most only after the last PLT stub a Spectre Revalation
Gadget might appear.
Testing:
Bootstrap and regtest on AArch64
(with BOOT_CFLAGS="-mharden-sls=retbr,blr")
Used a temporary hack(1) in gcc-dg.exp to use these options on every
test in the testsuite, a slight modification to emit the speculation
barrier after every function stub, and a script to check that the
output never emitted a BLR, or unmitigated BR or RET instruction.
Similar on an aarch64-none-elf cross-compiler.
1) Temporary hack emitted a speculation barrier at the end of every stub
function, and used a script to ensure that:
a) Every RET or BR is immediately followed by a speculation barrier.
b) No BLR instruction is emitted by compiler.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm):
New declaration.
* config/aarch64/aarch64.c (aarch64_regno_regclass): Handle new
stub registers class.
(aarch64_class_max_nregs): Likewise.
(aarch64_register_move_cost): Likewise.
(aarch64_sls_shared_thunks): Global array to store stub labels.
(aarch64_sls_emit_function_stub): New.
(aarch64_create_blr_label): New.
(aarch64_sls_emit_blr_function_thunks): New.
(aarch64_sls_emit_shared_blr_thunks): New.
(aarch64_asm_file_end): New.
(aarch64_indirect_call_asm): New.
(TARGET_ASM_FILE_END): Use aarch64_asm_file_end.
(TARGET_ASM_FUNCTION_EPILOGUE): Use
aarch64_sls_emit_blr_function_thunks.
* config/aarch64/aarch64.h (STB_REGNUM_P): New.
(enum reg_class): Add STUB_REGS class.
(machine_function): Introduce `call_via` array for
function-local stub labels.
* config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use
aarch64_indirect_call_asm to emit code when hardening BLR
instructions.
* config/aarch64/constraints.md (Ucr): New constraint
representing registers for indirect calls. Is GENERAL_REGS
usually, and STUB_REGS when hardening BLR instruction against
SLS.
* config/aarch64/predicates.md (aarch64_general_reg): STUB_REGS class
is also a general register.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test.
* gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test.
|
|
We take no action to ensure the SVE vector size is large enough. It is
left to the user to check that before compiling this intrinsic or before
running such a program on a machine.
The main difference between ld1ro and ld1rq is in the allowed offsets,
the implementation difference is that ld1ro is implemented using integer
modes since there are no pre-existing vector modes of the relevant size.
Adding new vector modes simply for this intrinsic seems to make the code
less tidy.
Specifications can be found under the "Arm C Language Extensions for
Scalable Vector Extension" title at
https://developer.arm.com/architectures/system-architectures/software-standards/acle
gcc/ChangeLog:
2020-01-17 Matthew Malcomson <matthew.malcomson@arm.com>
* config/aarch64/aarch64-protos.h
(aarch64_sve_ld1ro_operand_p): New.
* config/aarch64/aarch64-sve-builtins-base.cc
(class load_replicate): New.
(class svld1ro_impl): New.
(class svld1rq_impl): Change to inherit from load_replicate.
(svld1ro): New sve intrinsic function base.
* config/aarch64/aarch64-sve-builtins-base.def (svld1ro):
New DEF_SVE_FUNCTION.
* config/aarch64/aarch64-sve-builtins-base.h
(svld1ro): New decl.
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::add_mem_operand): Modify assert to allow
OImode.
* config/aarch64/aarch64-sve.md (@aarch64_sve_ld1ro<mode>): New
pattern.
* config/aarch64/aarch64.c
(aarch64_sve_ld1rq_operand_p): Implement in terms of ...
(aarch64_sve_ld1rq_ld1ro_operand_p): This.
(aarch64_sve_ld1ro_operand_p): New.
* config/aarch64/aarch64.md (UNSPEC_LD1RO): New unspec.
* config/aarch64/constraints.md (UOb,UOh,UOw,UOd): New.
* config/aarch64/predicates.md
(aarch64_sve_ld1ro_operand_{b,h,w,d}): New.
gcc/testsuite/ChangeLog:
2020-01-17 Matthew Malcomson <matthew.malcomson@arm.com>
* gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: New test.
|
|
It helps the SVE2 ACLE support if aarch64_sve_arith_immediate_p and
aarch64_sve_sqadd_sqsub_immediate_p accept scalars as well as vectors.
2020-01-09 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_sve_arith_immediate_p)
(aarch64_sve_sqadd_sqsub_immediate_p): Add a machine_mode argument.
* config/aarch64/aarch64.c (aarch64_sve_arith_immediate_p)
(aarch64_sve_sqadd_sqsub_immediate_p): Likewise. Handle scalar
immediates as well as vector ones.
* config/aarch64/predicates.md (aarch64_sve_arith_immediate)
(aarch64_sve_sub_arith_immediate, aarch64_sve_qadd_immediate)
(aarch64_sve_qsub_immediate): Update calls accordingly.
From-SVN: r280059
|
|
From-SVN: r279813
|
|
2019-11-19 Dennis Zhang <dennis.zhang@arm.com>
* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
(aarch64_init_memtag_builtins): New.
(AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
(aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
(aarch64_expand_builtin_memtag): New.
(aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
(AARCH64_BUILTIN_SUBCODE): New macro.
(aarch64_resolve_overloaded_memtag): New.
(aarch64_resolve_overloaded_builtin_general): New. Call
aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_MEMORY_TAGGING when enabled.
(aarch64_resolve_overloaded_builtin): Call
aarch64_resolve_overloaded_builtin_general.
* config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin_general): New declaration.
* config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
(TARGET_MEMTAG): Likewise.
* config/aarch64/aarch64.md (UNSPEC_GEN_TAG): New unspec.
(UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE): Likewise.
(irg, gmi, subp, addg, ldg, stg): New instructions.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
(__arm_mte_exclude_tag, __arm_mte_ptrdiff): Likewise.
(__arm_mte_increment_tag, __arm_mte_set_tag): Likewise.
(__arm_mte_get_tag): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
(aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
* config/arm/types.md (memtag): New.
* doc/invoke.texi (-memtag): Update description.
2019-11-19 Dennis Zhang <dennis.zhang@arm.com>
* gcc.target/aarch64/acle/memtag_1.c: New test.
* gcc.target/aarch64/acle/memtag_2.c: New test.
* gcc.target/aarch64/acle/memtag_3.c: New test.
From-SVN: r278444
|
|
This patch adds optabs that check whether a read followed by a write
or a write followed by a read can be divided into interleaved byte
accesses without changing the dependencies between the bytes.
This is one of the uses of the SVE2 WHILERW and WHILEWR instructions.
(The instructions can also be used to limit the VF at runtime,
but that's future work.)
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* doc/sourcebuild.texi (vect_check_ptrs): Document.
* optabs.def (check_raw_ptrs_optab, check_war_ptrs_optab): New optabs.
* doc/md.texi: Document them.
* internal-fn.def (IFN_CHECK_RAW_PTRS, IFN_CHECK_WAR_PTRS): New
internal functions.
* internal-fn.h (internal_check_ptrs_fn_supported_p): Declare.
* internal-fn.c (check_ptrs_direct): New macro.
(expand_check_ptrs_optab_fn): Likewise.
(direct_check_ptrs_optab_supported_p): Likewise.
(internal_check_ptrs_fn_supported_p): New fuction.
* tree-data-ref.c: Include internal-fn.h.
(create_ifn_alias_checks): New function.
(create_intersect_range_checks): Use it.
* config/aarch64/iterators.md (SVE2_WHILE_PTR): New int iterator.
(optab, cmp_op): Handle it.
(raw_war, unspec): New int attributes.
* config/aarch64/aarch64.md (UNSPEC_WHILERW, UNSPEC_WHILE_WR): New
constants.
* config/aarch64/predicates.md (aarch64_bytes_per_sve_vector_operand):
New predicate.
* config/aarch64/aarch64-sve2.md (check_<raw_war>_ptrs<mode>): New
expander.
(@aarch64_sve2_while<cmp_op><GPI:mode><PRED_ALL:mode>_ptest): New
pattern.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_check_ptrs):
New procedure.
* gcc.dg/vect/vect-alias-check-14.c: Expect IFN_CHECK_WAR to be
used, if available.
* gcc.dg/vect/vect-alias-check-15.c: Likewise.
* gcc.dg/vect/vect-alias-check-16.c: Likewise IFN_CHECK_RAW.
* gcc.target/aarch64/sve2/whilerw_1.c: New test.
* gcc.target/aarch64/sve2/whilewr_1.c: Likewise.
* gcc.target/aarch64/sve2/whilewr_2.c: Likewise.
From-SVN: r278414
|
|
This patch adds support for arm_sve.h. I've tried to split all the
groundwork out into separate patches, so this is mostly adding new code
rather than changing existing code.
The C++ frontend seems to handle correct ACLE code without modification,
even in length-agnostic mode. The C frontend is close; the only correct
construct I know it doesn't handle is initialisation. E.g.:
svbool_t pg = svptrue_b8 ();
produces:
variable-sized object may not be initialized
although:
svbool_t pg; pg = svptrue_b8 ();
works fine. This can be fixed by changing:
{
/* A complete type is ok if size is fixed. */
- if (TREE_CODE (TYPE_SIZE (TREE_TYPE (decl))) != INTEGER_CST
+ if (!poly_int_tree_p (TYPE_SIZE (TREE_TYPE (decl)))
|| C_DECL_VARIABLE_SIZE (decl))
{
error ("variable-sized object may not be initialized");
in c/c-decl.c:start_decl.
Invalid code is likely to trigger ICEs, so this isn't ready for general
use yet. However, it seemed better to apply the patch now and deal with
diagnosing invalid code as a follow-up. For one thing, it means that
we'll be able to provide testcases for middle-end changes related
to SVE vectors, which has been a problem until now. (I already have
a series of such patches lined up.)
The patch includes some tests, but the main ones need to wait until the
PCS support has been applied.
2019-10-29 Richard Sandiford <richard.sandiford@arm.com>
Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
gcc/
* config.gcc (aarch64*-*-*): Add arm_sve.h to extra_headers.
Add aarch64-sve-builtins.o, aarch64-sve-builtins-shapes.o and
aarch64-sve-builtins-base.o to extra_objs. Add
aarch64-sve-builtins.h and aarch64-sve-builtins.cc to target_gtfiles.
* config/aarch64/t-aarch64 (aarch64-sve-builtins.o): New rule.
(aarch64-sve-builtins-shapes.o): Likewise.
(aarch64-sve-builtins-base.o): New rules.
* config/aarch64/aarch64-c.c (aarch64_pragma_aarch64): New function.
(aarch64_resolve_overloaded_builtin): Likewise.
(aarch64_check_builtin_call): Likewise.
(aarch64_register_pragmas): Install aarch64_resolve_overloaded_builtin
and aarch64_check_builtin_call in targetm. Register the GCC aarch64
pragma.
* config/aarch64/aarch64-protos.h (AARCH64_FOR_SVPRFOP): New macro.
(aarch64_svprfop): New enum.
(AARCH64_BUILTIN_SVE): New aarch64_builtin_class enum value.
(aarch64_sve_int_mode, aarch64_sve_data_mode): Declare.
(aarch64_fold_sve_cnt_pat, aarch64_output_sve_prefetch): Likewise.
(aarch64_output_sve_cnt_pat_immediate): Likewise.
(aarch64_output_sve_ptrues, aarch64_sve_ptrue_svpattern_p): Likewise.
(aarch64_sve_sqadd_sqsub_immediate_p, aarch64_sve_ldff1_operand_p)
(aarch64_sve_ldnf1_operand_p, aarch64_sve_prefetch_operand_p)
(aarch64_ptrue_all_mode, aarch64_convert_sve_data_to_pred): Likewise.
(aarch64_expand_sve_dupq, aarch64_replace_reg_mode): Likewise.
(aarch64_sve::init_builtins, aarch64_sve::handle_arm_sve_h): Likewise.
(aarch64_sve::builtin_decl, aarch64_sve::builtin_type_p): Likewise.
(aarch64_sve::mangle_builtin_type): Likewise.
(aarch64_sve::resolve_overloaded_builtin): Likewise.
(aarch64_sve::check_builtin_call, aarch64_sve::gimple_fold_builtin)
(aarch64_sve::expand_builtin): Likewise.
* config/aarch64/aarch64.c (aarch64_sve_data_mode): Make public.
(aarch64_sve_int_mode): Likewise.
(aarch64_ptrue_all_mode): New function.
(aarch64_convert_sve_data_to_pred): Make public.
(svprfop_token): New function.
(aarch64_output_sve_prefetch): Likewise.
(aarch64_fold_sve_cnt_pat): Likewise.
(aarch64_output_sve_cnt_pat_immediate): Likewise.
(aarch64_sve_move_pred_via_while): Use gen_while with UNSPEC_WHILE_LO
instead of gen_while_ult.
(aarch64_replace_reg_mode): Make public.
(aarch64_init_builtins): Call aarch64_sve::init_builtins.
(aarch64_fold_builtin): Handle AARCH64_BUILTIN_SVE.
(aarch64_gimple_fold_builtin, aarch64_expand_builtin): Likewise.
(aarch64_builtin_decl, aarch64_builtin_reciprocal): Likewise.
(aarch64_mangle_type): Call aarch64_sve::mangle_type.
(aarch64_sve_sqadd_sqsub_immediate_p): New function.
(aarch64_sve_ptrue_svpattern_p): Likewise.
(aarch64_sve_pred_valid_immediate): Check
aarch64_sve_ptrue_svpattern_p.
(aarch64_sve_ldff1_operand_p, aarch64_sve_ldnf1_operand_p)
(aarch64_sve_prefetch_operand_p, aarch64_output_sve_ptrues): New
functions.
* config/aarch64/aarch64.md (UNSPEC_LDNT1_SVE, UNSPEC_STNT1_SVE)
(UNSPEC_LDFF1_GATHER, UNSPEC_PTRUE, UNSPEC_WHILE_LE, UNSPEC_WHILE_LS)
(UNSPEC_WHILE_LT, UNSPEC_CLASTA, UNSPEC_UPDATE_FFR)
(UNSPEC_UPDATE_FFRT, UNSPEC_RDFFR, UNSPEC_WRFFR)
(UNSPEC_SVE_LANE_SELECT, UNSPEC_SVE_CNT_PAT, UNSPEC_SVE_PREFETCH)
(UNSPEC_SVE_PREFETCH_GATHER, UNSPEC_SVE_COMPACT, UNSPEC_SVE_SPLICE):
New unspecs.
* config/aarch64/iterators.md (SI_ONLY, DI_ONLY, VNx8HI_ONLY)
(VNx2DI_ONLY, SVE_PARTIAL, VNx8_NARROW, VNx8_WIDE, VNx4_NARROW)
(VNx4_WIDE, VNx2_NARROW, VNx2_WIDE, PRED_HSD): New mode iterators.
(UNSPEC_ADR, UNSPEC_BRKA, UNSPEC_BRKB, UNSPEC_BRKN, UNSPEC_BRKPA)
(UNSPEC_BRKPB, UNSPEC_PFIRST, UNSPEC_PNEXT, UNSPEC_CNTP, UNSPEC_SADDV)
(UNSPEC_UADDV, UNSPEC_FMLA, UNSPEC_FMLS, UNSPEC_FEXPA, UNSPEC_FTMAD)
(UNSPEC_FTSMUL, UNSPEC_FTSSEL, UNSPEC_COND_CMPEQ_WIDE): New unspecs.
(UNSPEC_COND_CMPGE_WIDE, UNSPEC_COND_CMPGT_WIDE): Likewise.
(UNSPEC_COND_CMPHI_WIDE, UNSPEC_COND_CMPHS_WIDE): Likewise.
(UNSPEC_COND_CMPLE_WIDE, UNSPEC_COND_CMPLO_WIDE): Likewise.
(UNSPEC_COND_CMPLS_WIDE, UNSPEC_COND_CMPLT_WIDE): Likewise.
(UNSPEC_COND_CMPNE_WIDE, UNSPEC_COND_FCADD90, UNSPEC_COND_FCADD270)
(UNSPEC_COND_FCMLA, UNSPEC_COND_FCMLA90, UNSPEC_COND_FCMLA180)
(UNSPEC_COND_FCMLA270, UNSPEC_COND_FMAX, UNSPEC_COND_FMIN): Likewise.
(UNSPEC_COND_FMULX, UNSPEC_COND_FRECPX, UNSPEC_COND_FSCALE): Likewise.
(UNSPEC_LASTA, UNSPEC_ASHIFT_WIDE, UNSPEC_ASHIFTRT_WIDE): Likewise.
(UNSPEC_LSHIFTRT_WIDE, UNSPEC_LDFF1, UNSPEC_LDNF1): Likewise.
(Vesize): Handle partial vector modes.
(self_mask, narrower_mask, sve_lane_con, sve_lane_pair_con): New
mode attributes.
(UBINQOPS, ANY_PLUS, SAT_PLUS, ANY_MINUS, SAT_MINUS): New code
iterators.
(s, paired_extend, inc_dec): New code attributes.
(SVE_INT_ADDV, CLAST, LAST): New int iterators.
(SVE_INT_UNARY): Add UNSPEC_RBIT.
(SVE_FP_UNARY, SVE_FP_UNARY_INT): New int iterators.
(SVE_FP_BINARY, SVE_FP_BINARY_INT): Likewise.
(SVE_COND_FP_UNARY): Add UNSPEC_COND_FRECPX.
(SVE_COND_FP_BINARY): Add UNSPEC_COND_FMAX, UNSPEC_COND_FMIN and
UNSPEC_COND_FMULX.
(SVE_COND_FP_BINARY_INT, SVE_COND_FP_ADD): New int iterators.
(SVE_COND_FP_SUB, SVE_COND_FP_MUL): Likewise.
(SVE_COND_FP_BINARY_I1): Add UNSPEC_COND_FMAX and UNSPEC_COND_FMIN.
(SVE_COND_FP_BINARY_REG): Add UNSPEC_COND_FMULX.
(SVE_COND_FCADD, SVE_COND_FP_MAXMIN, SVE_COND_FCMLA)
(SVE_COND_INT_CMP_WIDE, SVE_FP_TERNARY_LANE, SVE_CFP_TERNARY_LANE)
(SVE_WHILE, SVE_SHIFT_WIDE, SVE_LDFF1_LDNF1, SVE_BRK_UNARY)
(SVE_BRK_BINARY, SVE_PITER): New int iterators.
(optab): Handle UNSPEC_SADDV, UNSPEC_UADDV, UNSPEC_FRECPE,
UNSPEC_FRECPS, UNSPEC_RSQRTE, UNSPEC_RSQRTS, UNSPEC_RBIT,
UNSPEC_SMUL_HIGHPART, UNSPEC_UMUL_HIGHPART, UNSPEC_FMLA, UNSPEC_FMLS,
UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270,
UNSPEC_FEXPA, UNSPEC_FTSMUL, UNSPEC_FTSSEL, UNSPEC_COND_FCADD90,
UNSPEC_COND_FCADD270, UNSPEC_COND_FCMLA, UNSPEC_COND_FCMLA90,
UNSPEC_COND_FCMLA180, UNSPEC_COND_FCMLA270, UNSPEC_COND_FMAX,
UNSPEC_COND_FMIN, UNSPEC_COND_FMULX, UNSPEC_COND_FRECPX and
UNSPEC_COND_FSCALE.
(maxmin_uns): Handle UNSPEC_COND_FMAX and UNSPEC_COND_FMIN.
(binqops_op, binqops_op_rev, last_op): New int attributes.
(su): Handle UNSPEC_SADDV and UNSPEC_UADDV.
(fn, ab): New int attributes.
(cmp_op): Handle UNSPEC_COND_CMP*_WIDE and UNSPEC_WHILE_*.
(while_optab_cmp, brk_op, sve_pred_op): New int attributes.
(sve_int_op): Handle UNSPEC_SMUL_HIGHPART, UNSPEC_UMUL_HIGHPART,
UNSPEC_ASHIFT_WIDE, UNSPEC_ASHIFTRT_WIDE, UNSPEC_LSHIFTRT_WIDE and
UNSPEC_RBIT.
(sve_fp_op): Handle UNSPEC_FRECPE, UNSPEC_FRECPS, UNSPEC_RSQRTE,
UNSPEC_RSQRTS, UNSPEC_FMLA, UNSPEC_FMLS, UNSPEC_FEXPA, UNSPEC_FTSMUL,
UNSPEC_FTSSEL, UNSPEC_COND_FMAX, UNSPEC_COND_FMIN, UNSPEC_COND_FMULX,
UNSPEC_COND_FRECPX and UNSPEC_COND_FSCALE.
(sve_fp_op_rev): Handle UNSPEC_COND_FMAX, UNSPEC_COND_FMIN and
UNSPEC_COND_FMULX.
(rot): Handle UNSPEC_COND_FCADD* and UNSPEC_COND_FCMLA*.
(brk_reg_con, brk_reg_opno): New int attributes.
(sve_pred_fp_rhs1_operand, sve_pred_fp_rhs2_operand): Handle
UNSPEC_COND_FMAX, UNSPEC_COND_FMIN and UNSPEC_COND_FMULX.
(sve_pred_fp_rhs2_immediate): Handle UNSPEC_COND_FMAX and
UNSPEC_COND_FMIN.
(max_elem_bits): New int attribute.
(min_elem_bits): Handle UNSPEC_RBIT.
* config/aarch64/predicates.md (subreg_lowpart_operator): Handle
TRUNCATE as well as SUBREG.
(ascending_int_parallel, aarch64_simd_reg_or_minus_one)
(aarch64_sve_ldff1_operand, aarch64_sve_ldnf1_operand)
(aarch64_sve_prefetch_operand, aarch64_sve_ptrue_svpattern_immediate)
(aarch64_sve_qadd_immediate, aarch64_sve_qsub_immediate)
(aarch64_sve_gather_immediate_b, aarch64_sve_gather_immediate_h)
(aarch64_sve_gather_immediate_w, aarch64_sve_gather_immediate_d)
(aarch64_sve_sqadd_operand, aarch64_sve_gather_offset_b)
(aarch64_sve_gather_offset_h, aarch64_sve_gather_offset_w)
(aarch64_sve_gather_offset_d, aarch64_gather_scale_operand_b)
(aarch64_gather_scale_operand_h): New predicates.
* config/aarch64/constraints.md (UPb, UPd, UPh, UPw, Utf, Utn, vgb)
(vgd, vgh, vgw, vsQ, vsS): New constraints.
* config/aarch64/aarch64-sve.md: Add a note on the FFR handling.
(*aarch64_sve_reinterpret<mode>): Allow any source register
instead of requiring an exact match.
(*aarch64_sve_ptruevnx16bi_cc, *aarch64_sve_ptrue<mode>_cc)
(*aarch64_sve_ptruevnx16bi_ptest, *aarch64_sve_ptrue<mode>_ptest)
(aarch64_wrffr, aarch64_update_ffr_for_load, aarch64_copy_ffr_to_ffrt)
(aarch64_rdffr, aarch64_rdffr_z, *aarch64_rdffr_z_ptest)
(*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc)
(aarch64_update_ffrt): New patterns.
(@aarch64_load_<ANY_EXTEND:optab><VNx8_WIDE:mode><VNx8_NARROW:mode>)
(@aarch64_load_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
(@aarch64_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>)
(@aarch64_ld<fn>f1<mode>): New patterns.
(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><VNx8_WIDE:mode><VNx8_NARROW:mode>)
(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>)
(@aarch64_ldnt1<mode>): New patterns.
(gather_load<mode>): Use aarch64_sve_gather_offset_<Vesize> for
the scalar part of the address.
(mask_gather_load<SVE_S:mode>): Use aarch64_sve_gather_offset_w for the
scalar part of the addresse and add an alternative for handling
nonzero offsets.
(mask_gather_load<SVE_D:mode>): Likewise aarch64_sve_gather_offset_d.
(*mask_gather_load<mode>_sxtw, *mask_gather_load<mode>_uxtw)
(@aarch64_gather_load_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
(@aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>)
(*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw)
(*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw)
(@aarch64_ldff1_gather<SVE_S:mode>, @aarch64_ldff1_gather<SVE_D:mode>)
(*aarch64_ldff1_gather<mode>_sxtw, *aarch64_ldff1_gather<mode>_uxtw)
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>)
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw)
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw)
(@aarch64_sve_prefetch<mode>): New patterns.
(@aarch64_sve_gather_prefetch<SVE_I:mode><VNx4SI_ONLY:mode>)
(@aarch64_sve_gather_prefetch<SVE_I:mode><VNx2DI_ONLY:mode>)
(*aarch64_sve_gather_prefetch<SVE_I:mode><VNx2DI_ONLY:mode>_sxtw)
(*aarch64_sve_gather_prefetch<SVE_I:mode><VNx2DI_ONLY:mode>_uxtw)
(@aarch64_store_trunc<VNx8_NARROW:mode><VNx8_WIDE:mode>)
(@aarch64_store_trunc<VNx4_NARROW:mode><VNx4_WIDE:mode>)
(@aarch64_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>)
(@aarch64_stnt1<mode>): New patterns.
(scatter_store<mode>): Use aarch64_sve_gather_offset_<Vesize> for
the scalar part of the address.
(mask_scatter_store<SVE_S:mode>): Use aarch64_sve_gather_offset_w for
the scalar part of the addresse and add an alternative for handling
nonzero offsets.
(mask_scatter_store<SVE_D:mode>): Likewise aarch64_sve_gather_offset_d.
(*mask_scatter_store<mode>_sxtw, *mask_scatter_store<mode>_uxtw)
(@aarch64_scatter_store_trunc<VNx4_NARROW:mode><VNx4_WIDE:mode>)
(@aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>)
(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_sxtw)
(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_uxtw):
New patterns.
(vec_duplicate<mode>): Use QI as the mode of the input operand.
(extract_last_<mode>): Generalize to...
(@extract_<LAST:last_op>_<mode>): ...this.
(*<SVE_INT_UNARY:optab><mode>2): Rename to...
(@aarch64_pred_<SVE_INT_UNARY:optab><mode>): ...this.
(@cond_<SVE_INT_UNARY:optab><mode>): New expander.
(@aarch64_pred_sxt<SVE_HSDI:mode><SVE_PARTIAL:mode>): New pattern.
(@aarch64_cond_sxt<SVE_HSDI:mode><SVE_PARTIAL:mode>): Likewise.
(@aarch64_pred_cnot<mode>, @cond_cnot<mode>): New expanders.
(@aarch64_sve_<SVE_FP_UNARY_INT:optab><mode>): New pattern.
(@aarch64_sve_<SVE_FP_UNARY:optab><mode>): Likewise.
(*<SVE_COND_FP_UNARY:optab><mode>2): Rename to...
(@aarch64_pred_<SVE_COND_FP_UNARY:optab><mode>): ...this.
(@cond_<SVE_COND_FP_UNARY:optab><mode>): New expander.
(*<SVE_INT_BINARY_IMM:optab><mode>3): Rename to...
(@aarch64_pred_<SVE_INT_BINARY_IMM:optab><mode>): ...this.
(@aarch64_adr<mode>, *aarch64_adr_sxtw): New patterns.
(*aarch64_adr_uxtw_unspec): Likewise.
(*aarch64_adr_uxtw): Rename to...
(*aarch64_adr_uxtw_and): ...this.
(@aarch64_adr<mode>_shift): New expander.
(*aarch64_adr_shift_sxtw): New pattern.
(aarch64_<su>abd<mode>_3): Rename to...
(@aarch64_pred_<su>abd<mode>): ...this.
(<su>abd<mode>_3): Update accordingly.
(@aarch64_cond_<su>abd<mode>): New expander.
(@aarch64_<SBINQOPS:su_optab><optab><mode>): New pattern.
(@aarch64_<UBINQOPS:su_optab><optab><mode>): Likewise.
(*<su>mul<mode>3_highpart): Rename to...
(@aarch64_pred_<optab><mode>): ...this.
(@cond_<MUL_HIGHPART:optab><mode>): New expander.
(*cond_<MUL_HIGHPART:optab><mode>_2): New pattern.
(*cond_<MUL_HIGHPART:optab><mode>_z): Likewise.
(*<SVE_INT_BINARY_SD:optab><mode>3): Rename to...
(@aarch64_pred_<SVE_INT_BINARY_SD:optab><mode>): ...this.
(cond_<SVE_INT_BINARY_SD:optab><mode>): Add a "@" marker.
(@aarch64_bic<mode>, @cond_bic<mode>): New expanders.
(*v<ASHIFT:optab><mode>3): Rename to...
(@aarch64_pred_<ASHIFT:optab><mode>): ...this.
(@aarch64_sve_<SVE_SHIFT_WIDE:sve_int_op><mode>): New pattern.
(@cond_<SVE_SHIFT_WIDE:sve_int_op><mode>): New expander.
(*cond_<SVE_SHIFT_WIDE:sve_int_op><mode>_m): New pattern.
(*cond_<SVE_SHIFT_WIDE:sve_int_op><mode>_z): Likewise.
(@cond_asrd<mode>): New expander.
(*cond_asrd<mode>_2, *cond_asrd<mode>_z): New patterns.
(sdiv_pow2<mode>3): Expand to *cond_asrd<mode>_2.
(*sdiv_pow2<mode>3): Delete.
(@cond_<SVE_COND_FP_BINARY_INT:optab><mode>): New expander.
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_2): New pattern.
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_any): Likewise.
(@aarch64_sve_<SVE_FP_BINARY:optab><mode>): New pattern.
(@aarch64_sve_<SVE_FP_BINARY_INT:optab><mode>): Likewise.
(*<SVE_COND_FP_BINARY_REG:optab><mode>3): Rename to...
(@aarch64_pred_<SVE_COND_FP_BINARY_REG:optab><mode>): ...this.
(@aarch64_pred_<SVE_COND_FP_BINARY_INT:optab><mode>): New pattern.
(cond_<SVE_COND_FP_BINARY:optab><mode>): Add a "@" marker.
(*add<SVE_F:mode>3): Rename to...
(@aarch64_pred_add<SVE_F:mode>): ...this and add alternatives
for SVE_STRICT_GP.
(@aarch64_pred_<SVE_COND_FCADD:optab><mode>): New pattern.
(@cond_<SVE_COND_FCADD:optab><mode>): New expander.
(*cond_<SVE_COND_FCADD:optab><mode>_2): New pattern.
(*cond_<SVE_COND_FCADD:optab><mode>_any): Likewise.
(*sub<SVE_F:mode>3): Rename to...
(@aarch64_pred_sub<SVE_F:mode>): ...this and add alternatives
for SVE_STRICT_GP.
(@aarch64_pred_abd<SVE_F:mode>): New expander.
(*fabd<SVE_F:mode>3): Rename to...
(*aarch64_pred_abd<SVE_F:mode>): ...this.
(@aarch64_cond_abd<SVE_F:mode>): New expander.
(*mul<SVE_F:mode>3): Rename to...
(@aarch64_pred_<SVE_F:optab><mode>): ...this and add alternatives
for SVE_STRICT_GP.
(@aarch64_mul_lane_<SVE_F:mode>): New pattern.
(*<SVE_COND_FP_MAXMIN_PUBLIC:optab><mode>3): Rename and generalize
to...
(@aarch64_pred_<SVE_COND_FP_MAXMIN:optab><mode>): ...this.
(*<LOGICAL:optab><PRED_ALL:mode>3_ptest): New pattern.
(*<nlogical><PRED_ALL:mode>3): Rename to...
(aarch64_pred_<nlogical><PRED_ALL:mode>_z): ...this.
(*<nlogical><PRED_ALL:mode>3_cc): New pattern.
(*<nlogical><PRED_ALL:mode>3_ptest): Likewise.
(*<logical_nn><PRED_ALL:mode>3): Rename to...
(aarch64_pred_<logical_nn><mode>_z): ...this.
(*<logical_nn><PRED_ALL:mode>3_cc): New pattern.
(*<logical_nn><PRED_ALL:mode>3_ptest): Likewise.
(*fma<SVE_I:mode>4): Rename to...
(@aarch64_pred_fma<SVE_I:mode>): ...this.
(*fnma<SVE_I:mode>4): Rename to...
(@aarch64_pred_fnma<SVE_I:mode>): ...this.
(@aarch64_<sur>dot_prod_lane<vsi2qi>): New pattern.
(*<SVE_FP_TERNARY:optab><mode>4): Rename to...
(@aarch64_pred_<SVE_FP_TERNARY:optab><mode>): ...this.
(cond_<SVE_FP_TERNARY:optab><mode>): Add a "@" marker.
(@aarch64_<SVE_FP_TERNARY_LANE:optab>_lane_<mode>): New pattern.
(@aarch64_pred_<SVE_COND_FCMLA:optab><mode>): Likewise.
(@cond_<SVE_COND_FCMLA:optab><mode>): New expander.
(*cond_<SVE_COND_FCMLA:optab><mode>_4): New pattern.
(*cond_<SVE_COND_FCMLA:optab><mode>_any): Likewise.
(@aarch64_<FCMLA:optab>_lane_<mode>): Likewise.
(@aarch64_sve_tmad<mode>): Likewise.
(vcond_mask_<SVE_ALL:mode><vpred>): Add a "@" marker.
(*aarch64_sel_dup<mode>): Rename to...
(@aarch64_sel_dup<mode>): ...this.
(@aarch64_pred_cmp<cmp_op><SVE_I:mode>_wide): New pattern.
(*aarch64_pred_cmp<cmp_op><SVE_I:mode>_wide_cc): Likewise.
(*aarch64_pred_cmp<cmp_op><SVE_I:mode>_wide_ptest): Likewise.
(@while_ult<GPI:mode><PRED_ALL:mode>): Generalize to...
(@while_<while_optab_cmp><GPI:mode><PRED_ALL:mode>): ...this.
(*while_ult<GPI:mode><PRED_ALL:mode>_cc): Generalize to.
(*while_<while_optab_cmp><GPI:mode><PRED_ALL:mode>_cc): ...this.
(*while_<while_optab_cmp><GPI:mode><PRED_ALL:mode>_ptest): New pattern.
(*fcm<cmp_op><mode>): Rename to...
(@aarch64_pred_fcm<cmp_op><mode>): ...this. Make operand order
match @aarch64_pred_cmp<cmp_op><SVE_I:mode>.
(*fcmuo<mode>): Rename to...
(@aarch64_pred_fcmuo<mode>): ...this. Make operand order
match @aarch64_pred_cmp<cmp_op><SVE_I:mode>.
(@aarch64_pred_fac<cmp_op><mode>): New expander.
(@vcond_mask_<PRED_ALL:mode><mode>): New pattern.
(fold_extract_last_<mode>): Generalize to...
(@fold_extract_<last_op>_<mode>): ...this.
(@aarch64_fold_extract_vector_<last_op>_<mode>): New pattern.
(*reduc_plus_scal_<SVE_I:mode>): Replace with...
(@aarch64_pred_reduc_<optab>_<mode>): ...this pattern, making the
DImode result explicit.
(reduc_plus_scal_<mode>): Update accordingly.
(*reduc_<optab>_scal_<SVE_I:mode>): Rename to...
(@aarch64_pred_reduc_<optab>_<SVE_I:mode>): ...this.
(*reduc_<optab>_scal_<SVE_F:mode>): Rename to...
(@aarch64_pred_reduc_<optab>_<SVE_F:mode>): ...this.
(*aarch64_sve_tbl<mode>): Rename to...
(@aarch64_sve_tbl<mode>): ...this.
(@aarch64_sve_compact<mode>): New pattern.
(*aarch64_sve_dup_lane<mode>): Rename to...
(@aarch64_sve_dup_lane<mode>): ...this.
(@aarch64_sve_dupq_lane<mode>): New pattern.
(@aarch64_sve_splice<mode>): Likewise.
(aarch64_sve_<perm_insn><mode>): Rename to...
(@aarch64_sve_<perm_insn><mode>): ...this.
(*aarch64_sve_ext<mode>): Rename to...
(@aarch64_sve_ext<mode>): ...this.
(aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): Add a "@" marker.
(*aarch64_sve_<optab>_nontrunc<SVE_F:mode><SVE_HSDI:mode>): Rename
to...
(@aarch64_sve_<optab>_nontrunc<SVE_F:mode><SVE_HSDI:mode>): ...this.
(*aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>):
Rename to...
(@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>):
...this.
(@cond_<optab>_nontrunc<SVE_F:mode><SVE_HSDI:mode>): New expander.
(@cond_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>): Likewise.
(*cond_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>): New pattern.
(*aarch64_sve_<optab>_nonextend<SVE_HSDI:mode><SVE_F:mode>): Rename
to...
(@aarch64_sve_<optab>_nonextend<SVE_HSDI:mode><SVE_F:mode>): ...this.
(aarch64_sve_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>): Add
a "@" marker.
(@cond_<optab>_nonextend<SVE_HSDI:mode><SVE_F:mode>): New expander.
(@cond_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>): Likewise.
(*cond_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>): New
pattern.
(*aarch64_sve_<optab>_trunc<SVE_SDF:mode><SVE_HSF:mode>): Rename to...
(@aarch64_sve_<optab>_trunc<SVE_SDF:mode><SVE_HSF:mode>): ...this.
(@cond_<optab>_trunc<SVE_SDF:mode><SVE_HSF:mode>): New expander.
(*cond_<optab>_trunc<SVE_SDF:mode><SVE_HSF:mode>): New pattern.
(aarch64_sve_<optab>_nontrunc<SVE_HSF:mode><SVE_SDF:mode>): Add a
"@" marker.
(@cond_<optab>_nontrunc<SVE_HSF:mode><SVE_SDF:mode>): New expander.
(*cond_<optab>_nontrunc<SVE_HSF:mode><SVE_SDF:mode>): New pattern.
(aarch64_sve_punpk<perm_hilo>_<mode>): Add a "@" marker.
(@aarch64_brk<SVE_BRK_UNARY:brk_op>): New pattern.
(*aarch64_brk<SVE_BRK_UNARY:brk_op>_cc): Likewise.
(*aarch64_brk<SVE_BRK_UNARY:brk_op>_ptest): Likewise.
(@aarch64_brk<SVE_BRK_BINARY:brk_op>): Likewise.
(*aarch64_brk<SVE_BRK_BINARY:brk_op>_cc): Likewise.
(*aarch64_brk<SVE_BRK_BINARY:brk_op>_ptest): Likewise.
(@aarch64_sve_<SVE_PITER:sve_pred_op><mode>): Likewise.
(*aarch64_sve_<SVE_PITER:sve_pred_op><mode>_cc): Likewise.
(*aarch64_sve_<SVE_PITER:sve_pred_op><mode>_ptest): Likewise.
(aarch64_sve_cnt_pat): Likewise.
(@aarch64_sve_<ANY_PLUS:inc_dec><DI_ONLY:mode>_pat): Likewise.
(*aarch64_sve_incsi_pat): Likewise.
(@aarch64_sve_<SAT_PLUS:inc_dec><SI_ONLY:mode>_pat): Likewise.
(@aarch64_sve_<ANY_PLUS:inc_dec><VNx2DI_ONLY:mode>_pat): Likewise.
(@aarch64_sve_<ANY_PLUS:inc_dec><VNx4SI_ONLY:mode>_pat): Likewise.
(@aarch64_sve_<ANY_PLUS:inc_dec><VNx8HI_ONLY:mode>_pat): New expander.
(*aarch64_sve_<ANY_PLUS:inc_dec><VNx8HI_ONLY:mode>_pat): New pattern.
(@aarch64_sve_<ANY_MINUS:inc_dec><DI_ONLY:mode>_pat): Likewise.
(*aarch64_sve_decsi_pat): Likewise.
(@aarch64_sve_<SAT_MINUS:inc_dec><SI_ONLY:mode>_pat): Likewise.
(@aarch64_sve_<ANY_MINUS:inc_dec><VNx2DI_ONLY:mode>_pat): Likewise.
(@aarch64_sve_<ANY_MINUS:inc_dec><VNx4SI_ONLY:mode>_pat): Likewise.
(@aarch64_sve_<ANY_MINUS:inc_dec><VNx8HI_ONLY:mode>_pat): New expander.
(*aarch64_sve_<ANY_MINUS:inc_dec><VNx8HI_ONLY:mode>_pat): New pattern.
(@aarch64_pred_cntp<mode>): Likewise.
(@aarch64_sve_<ANY_PLUS:inc_dec><DI_ONLY:mode><PRED_ALL:mode>_cntp):
New expander.
(*aarch64_sve_<ANY_PLUS:inc_dec><DI_ONLY:mode><PRED_ALL:mode>_cntp)
(*aarch64_incsi<PRED_ALL:mode>_cntp): New patterns.
(@aarch64_sve_<SAT_PLUS:inc_dec><SI_ONLY:mode><PRED_ALL:mode>_cntp):
New expander.
(*aarch64_sve_<SAT_PLUS:inc_dec><SI_ONLY:mode><PRED_ALL:mode>_cntp):
New pattern.
(@aarch64_sve_<ANY_PLUS:inc_dec><VNx2DI_ONLY:mode>_cntp): New expander.
(*aarch64_sve_<ANY_PLUS:inc_dec><VNx2DI_ONLY:mode>_cntp): New pattern.
(@aarch64_sve_<ANY_PLUS:inc_dec><VNx4SI_ONLY:mode>_cntp): New expander.
(*aarch64_sve_<ANY_PLUS:inc_dec><VNx4SI_ONLY:mode>_cntp): New pattern.
(@aarch64_sve_<ANY_PLUS:inc_dec><VNx8HI_ONLY:mode>_cntp): New expander.
(*aarch64_sve_<ANY_PLUS:inc_dec><VNx8HI_ONLY:mode>_cntp): New pattern.
(@aarch64_sve_<ANY_MINUS:inc_dec><DI_ONLY:mode><PRED_ALL:mode>_cntp):
New expander.
(*aarch64_sve_<ANY_MINUS:inc_dec><DI_ONLY:mode><PRED_ALL:mode>_cntp)
(*aarch64_incsi<PRED_ALL:mode>_cntp): New patterns.
(@aarch64_sve_<SAT_MINUS:inc_dec><SI_ONLY:mode><PRED_ALL:mode>_cntp):
New expander.
(*aarch64_sve_<SAT_MINUS:inc_dec><SI_ONLY:mode><PRED_ALL:mode>_cntp):
New pattern.
(@aarch64_sve_<ANY_MINUS:inc_dec><VNx2DI_ONLY:mode>_cntp): New
expander.
(*aarch64_sve_<ANY_MINUS:inc_dec><VNx2DI_ONLY:mode>_cntp): New pattern.
(@aarch64_sve_<ANY_MINUS:inc_dec><VNx4SI_ONLY:mode>_cntp): New
expander.
(*aarch64_sve_<ANY_MINUS:inc_dec><VNx4SI_ONLY:mode>_cntp): New pattern.
(@aarch64_sve_<ANY_MINUS:inc_dec><VNx8HI_ONLY:mode>_cntp): New
expander.
(*aarch64_sve_<ANY_MINUS:inc_dec><VNx8HI_ONLY:mode>_cntp): New pattern.
* config/aarch64/arm_sve.h: New file.
* config/aarch64/aarch64-sve-builtins.h: Likewise.
* config/aarch64/aarch64-sve-builtins.cc: Likewise.
* config/aarch64/aarch64-sve-builtins.def: Likewise.
* config/aarch64/aarch64-sve-builtins-base.h: Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc: Likewise.
* config/aarch64/aarch64-sve-builtins-base.def: Likewise.
* config/aarch64/aarch64-sve-builtins-functions.h: Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.h: Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc: Likewise.
gcc/testsuite/
* g++.target/aarch64/sve/acle/aarch64-sve-acle.exp: New file.
* g++.target/aarch64/sve/acle/general-c++: New test directory.
* gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp: New file.
* gcc.target/aarch64/sve/acle/general: New test directory.
* gcc.target/aarch64/sve/acle/general-c: Likewise.
Co-Authored-By: Kugan Vivekanandarajah <kuganv@linaro.org>
Co-Authored-By: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
From-SVN: r277563
|
|
The SVE ACLE has convenience functions that take scalar arguments
instead of vectors. This patch makes it easier to implement the shift
and compare functions by making the associated immediate queries work
for scalar immediates as well as vector duplicates of them.
The "const" codes in the predicates were a holdover from an early
version of the SVE port in which we used (const ...) wrappers for
variable-length vector constants. I'll remove other instances
of them in a separate patch.
2019-10-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_sve_cmp_immediate_p)
(aarch64_simd_shift_imm_p): Accept scalars as well as vectors.
* config/aarch64/predicates.md (aarch64_sve_cmp_vsc_immediate)
(aarch64_sve_cmp_vsd_immediate): Accept "const_int", but don't
accept "const".
From-SVN: r277556
|
|
gcc/ChangeLog:
2019-08-19 Joel Hutton <Joel.Hutton@arm.com>
* config/aarch64/aarch64-protos.h (aarch64_fpconst_pow2_recip): New prototype
* config/aarch64/aarch64.c (aarch64_fpconst_pow2_recip): New function
* config/aarch64/aarch64.md (*aarch64_<su_optab>cvtf<fcvt_target><GPF:mode>2_mult): New pattern
(*aarch64_<su_optab>cvtf<fcvt_iesize><GPF:mode>2_mult): New pattern
* config/aarch64/constraints.md (Dt): New constraint
* config/aarch64/predicates.md (aarch64_fpconst_pow2_recip): New predicate
gcc/testsuite/ChangeLog:
2019-08-19 Joel Hutton <Joel.Hutton@arm.com>
* gcc.target/aarch64/fmul_scvtf_1.c: New test.
From-SVN: r274676
|
|
The scalar addition patterns allowed all the VL constants that
ADDVL and ADDPL allow, but wrote the instructions as INC or DEC
if possible (i.e. adding or subtracting a number of elements * [1, 16]
when the source and target registers the same). That works for the
cases that the autovectoriser needs, but there are a few constants
that INC and DEC can handle but ADDPL and ADDVL can't. E.g.:
inch x0, all, mul #9
is not a multiple of the number of bytes in an SVE register, and so
can't use ADDVL. It represents 36 times the number of bytes in an
SVE predicate, putting it outside the range of ADDPL.
This patch therefore adds separate alternatives for INC and DEC,
tied to a new Uai constraint. It also adds an explicit "scalar"
or "vector" to the function names, to avoid a clash with the
existing support for vector INC and DEC.
2019-08-15 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64-protos.h
(aarch64_sve_scalar_inc_dec_immediate_p): Declare.
(aarch64_sve_inc_dec_immediate_p): Rename to...
(aarch64_sve_vector_inc_dec_immediate_p): ...this.
(aarch64_output_sve_addvl_addpl): Take a single rtx argument.
(aarch64_output_sve_scalar_inc_dec): Declare.
(aarch64_output_sve_inc_dec_immediate): Rename to...
(aarch64_output_sve_vector_inc_dec): ...this.
* config/aarch64/aarch64.c (aarch64_sve_scalar_inc_dec_immediate_p)
(aarch64_output_sve_scalar_inc_dec): New functions.
(aarch64_output_sve_addvl_addpl): Remove the base and offset
arguments. Only handle true ADDVL and ADDPL instructions;
don't emit an INC or DEC.
(aarch64_sve_inc_dec_immediate_p): Rename to...
(aarch64_sve_vector_inc_dec_immediate_p): ...this.
(aarch64_output_sve_inc_dec_immediate): Rename to...
(aarch64_output_sve_vector_inc_dec): ...this. Update call to
aarch64_sve_vector_inc_dec_immediate_p.
* config/aarch64/predicates.md (aarch64_sve_scalar_inc_dec_immediate)
(aarch64_sve_plus_immediate): New predicates.
(aarch64_pluslong_operand): Accept aarch64_sve_plus_immediate
rather than aarch64_sve_addvl_addpl_immediate.
(aarch64_sve_inc_dec_immediate): Rename to...
(aarch64_sve_vector_inc_dec_immediate): ...this. Update call to
aarch64_sve_vector_inc_dec_immediate_p.
(aarch64_sve_add_operand): Update accordingly.
* config/aarch64/constraints.md (Uai): New constraint.
(vsi): Update call to aarch64_sve_vector_inc_dec_immediate_p.
* config/aarch64/aarch64.md (add<GPI:mode>3): Don't force the second
operand into a register if it satisfies aarch64_sve_plus_immediate.
(*add<GPI:mode>3_aarch64, *add<GPI:mode>3_poly_1): Add an alternative
for Uai. Update calls to aarch64_output_sve_addvl_addpl.
* config/aarch64/aarch64-sve.md (add<mode>3): Call
aarch64_output_sve_vector_inc_dec instead of
aarch64_output_sve_inc_dec_immediate.
From-SVN: r274518
|
|
This patch lets us use the immediate forms of FADD, FSUB, FSUBR,
FMUL, FMAXNM and FMINNM for conditional arithmetic. (We already
use them for normal unconditional arithmetic.)
2019-08-15 Richard Sandiford <richard.sandiford@arm.com>
Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
gcc/
* config/aarch64/aarch64.c (aarch64_print_vector_float_operand):
Print 2.0 naturally.
(aarch64_sve_float_mul_immediate_p): Return true for 2.0.
* config/aarch64/predicates.md
(aarch64_sve_float_negated_arith_immediate): New predicate,
renamed from aarch64_sve_float_arith_with_sub_immediate.
(aarch64_sve_float_arith_with_sub_immediate): Test for both
positive and negative constants.
(aarch64_sve_float_arith_with_sub_operand): Redefine as a register
or an aarch64_sve_float_arith_with_sub_immediate.
* config/aarch64/constraints.md (vsN): Use
aarch64_sve_float_negated_arith_immediate.
* config/aarch64/iterators.md (SVE_COND_FP_BINARY_I1): New int
iterator.
(sve_pred_fp_rhs2_immediate): New int attribute.
* config/aarch64/aarch64-sve.md
(cond_<SVE_COND_FP_BINARY:optab><SVE_F:mode>): Use
sve_pred_fp_rhs1_operand and sve_pred_fp_rhs2_operand.
(*cond_<SVE_COND_FP_BINARY_I1:optab><SVE_F:mode>_2_const)
(*cond_<SVE_COND_FP_BINARY_I1:optab><SVE_F:mode>_any_const)
(*cond_add<SVE_F:mode>_2_const, *cond_add<SVE_F:mode>_any_const)
(*cond_sub<mode>_3_const, *cond_sub<mode>_any_const): New patterns.
gcc/testsuite/
* gcc.target/aarch64/sve/cond_fadd_1.c: New test.
* gcc.target/aarch64/sve/cond_fadd_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fadd_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_1.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fsubr_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_1.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_1.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_1.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_2.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_4.c: Likewise.
* gcc.target/aarch64/sve/cond_fmul_4_run.c: Likewise.
Co-Authored-By: Kugan Vivekanandarajah <kuganv@linaro.org>
From-SVN: r274508
|
|
UXTB, UXTH and UXTW are equivalent to predicated ANDs with the constants
0xff, 0xffff and 0xffffffff respectively. This patch uses them in the
patterns for IFN_COND_AND.
2019-08-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_print_operand): Allow %e to
take the equivalent mask, as well as a bit count.
* config/aarch64/predicates.md (aarch64_sve_uxtb_immediate)
(aarch64_sve_uxth_immediate, aarch64_sve_uxt_immediate)
(aarch64_sve_pred_and_operand): New predicates.
* config/aarch64/iterators.md (sve_pred_int_rhs2_operand): New
code attribute.
* config/aarch64/aarch64-sve.md
(cond_<SVE_INT_BINARY:optab><SVE_I:mode>): Use it.
(*cond_uxt<mode>_2, *cond_uxt<mode>_any): New patterns.
gcc/testsuite/
* gcc.target/aarch64/sve/cond_uxt_1.c: New test.
* gcc.target/aarch64/sve/cond_uxt_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_2.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_3.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_4.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_4_run.c: Likewise.
From-SVN: r274479
|
|
This patch extends the SVE UNSPEC_SEL patterns so that they can use:
(1) MOV /M of a duplicated integer constant
(2) MOV /M of a duplicated floating-point constant bitcast to an integer,
accepting the same constants as (1)
(3) FMOV /M of a duplicated floating-point constant
(4) MOV /Z of a duplicated integer constant
(5) MOV /Z of a duplicated floating-point constant bitcast to an integer,
accepting the same constants as (4)
(6) MOVPRFXed FMOV /M of a duplicated floating-point constant
We already handled (4) with a special pattern; the rest are new.
2019-08-14 Richard Sandiford <richard.sandiford@arm.com>
Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
gcc/
* config/aarch64/aarch64.c (aarch64_bit_representation): New function.
(aarch64_print_vector_float_operand): Also handle 8-bit floats.
(aarch64_print_operand): Add support for %I.
(aarch64_sve_dup_immediate_p): Handle scalars as well as vectors.
Bitcast floating-point constants to the corresponding integer constant.
(aarch64_float_const_representable_p): Handle vectors as well
as scalars.
(aarch64_expand_sve_vcond): Make sure that the operands are valid
for the new vcond_mask_<mode><vpred> expander.
* config/aarch64/predicates.md (aarch64_sve_dup_immediate): Also
test aarch64_float_const_representable_p.
(aarch64_sve_reg_or_dup_imm): New predicate.
* config/aarch64/aarch64-sve.md (vec_extract<vpred><Vel>): Use
gen_vcond_mask_<mode><vpred> instead of
gen_aarch64_sve_dup<mode>_const.
(vcond_mask_<mode><vpred>): Turn into a define_expand that
accepts aarch64_sve_reg_or_dup_imm and aarch64_simd_reg_or_zero
for operands 1 and 2 respectively. Force operand 2 into a
register if operand 1 is a register. Fold old define_insn...
(aarch64_sve_dup<mode>_const): ...and this define_insn...
(*vcond_mask_<mode><vpred>): ...into this new pattern. Handle
floating-point constants that can be moved as integers. Add
alternatives for MOV /M and FMOV /M.
(vcond<mode><v_int_equiv>, vcondu<mode><v_int_equiv>)
(vcond<mode><v_fp_equiv>): Accept nonmemory_operand for operands
1 and 2 respectively.
* config/aarch64/constraints.md (Ufc): Handle vectors as well
as scalars.
(vss): New constraint.
gcc/testsuite/
* gcc.target/aarch64/sve/vcond_18.c: New test.
* gcc.target/aarch64/sve/vcond_18_run.c: Likewise.
* gcc.target/aarch64/sve/vcond_19.c: Likewise.
* gcc.target/aarch64/sve/vcond_19_run.c: Likewise.
* gcc.target/aarch64/sve/vcond_20.c: Likewise.
* gcc.target/aarch64/sve/vcond_20_run.c: Likewise.
Co-Authored-By: Kugan Vivekanandarajah <kuganv@linaro.org>
From-SVN: r274441
|
|
This patch uses the immediate forms of FMAXNM and FMINNM for
unconditional arithmetic.
The same rules apply to FMAX and FMIN, but we only generate those
via the ACLE.
2019-08-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/predicates.md (aarch64_sve_float_maxmin_immediate)
(aarch64_sve_float_maxmin_operand): New predicates.
* config/aarch64/constraints.md (vsB): New constraint.
(vsM): Fix typo.
* config/aarch64/iterators.md (sve_pred_fp_rhs2_operand): Use
aarch64_sve_float_maxmin_operand for UNSPEC_COND_FMAXNM and
UNSPEC_COND_FMINNM.
* config/aarch64/aarch64-sve.md (<maxmin_uns><SVE_F:mode>3):
Use aarch64_sve_float_maxmin_operand for operand 2.
(*<SVE_COND_FP_MAXMIN_PUBLIC:optab><SVE_F:mode>3): Likewise.
Add alternatives for the constant forms.
gcc/testsuite/
* gcc.target/aarch64/sve/fmaxnm_1.c: New test.
* gcc.target/aarch64/sve/fminnm_1.c: Likewise.
From-SVN: r274440
|
|
This patch adds support for the immediate forms of SVE SMAX, SMIN, UMAX
and UMIN. SMAX and SMIN take the same range as MUL, so the patch
basically just moves and generalises the existing MUL patterns.
2019-08-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/constraints.md (vsb): New constraint.
(vsm): Generalize description.
* config/aarch64/iterators.md (SVE_INT_BINARY_IMM): New code
iterator.
(sve_imm_con): Handle smax, smin, umax and umin.
(sve_imm_prefix): New code attribute.
* config/aarch64/predicates.md (aarch64_sve_vsb_immediate)
(aarch64_sve_vsb_operand): New predicates.
(aarch64_sve_mul_immediate): Rename to...
(aarch64_sve_vsm_immediate): ...this.
(aarch64_sve_mul_operand): Rename to...
(aarch64_sve_vsm_operand): ...this.
* config/aarch64/aarch64-sve.md (mul<mode>3): Generalize to...
(<SVE_INT_BINARY_IMM:optab><SVE_I:mode>3): ...this.
(*mul<mode>3, *post_ra_mul<mode>3): Generalize to...
(*<SVE_INT_BINARY_IMM:optab><SVE_I:mode>3)
(*post_ra_<SVE_INT_BINARY_IMM:optab><SVE_I:mode>3): ...these and
add movprfx support for the immediate alternatives.
(<su><maxmin><mode>3, *<su><maxmin><mode>3): Delete in favor
of the above.
(*<SVE_INT_BINARY_SD:optab><SVE_SDI:mode>3): Fix incorrect predicate
for operand 3.
gcc/testsuite/
* gcc.target/aarch64/sve/smax_1.c: New test.
* gcc.target/aarch64/sve/smin_1.c: Likewise.
* gcc.target/aarch64/sve/umax_1.c: Likewise.
* gcc.target/aarch64/sve/umin_1.c: Likewise.
From-SVN: r274439
|
|
This patch adds support for predicated and unpredicated CNOT
(logical NOT on integers). In RTL terms, this is a select between
1 and 0 in which the predicate is fed by a comparison with zero.
2019-08-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/predicates.md (aarch64_simd_imm_one): New predicate.
* config/aarch64/aarch64-sve.md (*cnot<mode>): New pattern.
(*cond_cnot<mode>_2, *cond_cnot<mode>_any): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/cnot_1.c: New test.
* gcc.target/aarch64/sve/cond_cnot_1.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_2.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_3.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_3_run.c: Likewise.
From-SVN: r274438
|
|
This patch uses SVE ADR to optimise shift-and-add and uxtw-and-add
sequences.
2019-08-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/predicates.md (const_1_to_3_operand): New predicate.
* config/aarch64/aarch64-sve.md (*aarch64_adr_uxtw)
(*aarch64_adr<mode>_shift, *aarch64_adr_shift_uxtw): New patterns.
gcc/testsuite/
* gcc.target/aarch64/sve/adr_1.c: New test.
* gcc.target/aarch64/sve/adr_1_run.c: Likewise.
* gcc.target/aarch64/sve/adr_2.c: Likewise.
* gcc.target/aarch64/sve/adr_2_run.c: Likewise.
* gcc.target/aarch64/sve/adr_3.c: Likewise.
* gcc.target/aarch64/sve/adr_3_run.c: Likewise.
* gcc.target/aarch64/sve/adr_4.c: Likewise.
* gcc.target/aarch64/sve/adr_4_run.c: Likewise.
* gcc.target/aarch64/sve/adr_5.c: Likewise.
* gcc.target/aarch64/sve/adr_5_run.c: Likewise.
From-SVN: r274436
|
|
This patch makes the SVE unary, binary and ternary FP unspecs
take a new "GP strictness" operand that indicates whether the
predicate has to be taken literally, or whether it is valid to
make extra lanes active (up to and including using a PTRUE).
This again is laying the groundwork for the ACLE patterns,
in which the value can depend on the FP command-line flags.
At the moment it's only needed for addition, subtraction and
multiplication, which have unpredicated forms that can only
be used when operating on all lanes is safe. But in future
it might be useful for optimising predicate usage.
The strict mode requires extra alternatives for addition,
subtraction and multiplication, but I've left those for the
main ACLE patch.
2019-08-14 Richard Sandiford <richard.sandiford@arm.com>
Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
gcc/
* config/aarch64/aarch64.md (SVE_RELAXED_GP, SVE_STRICT_GP): New
constants.
* config/aarch64/predicates.md (aarch64_sve_gp_strictness): New
predicate.
* config/aarch64/aarch64-protos.h (aarch64_sve_pred_dominates_p):
Declare.
* config/aarch64/aarch64.c (aarch64_sve_pred_dominates_p): New
function.
* config/aarch64/aarch64-sve.md: Add a block comment about the
handling of predicated FP operations.
(<SVE_COND_FP_UNARY:optab><SVE_F:mode>2, add<SVE_F:mode>3)
(sub<SVE_F:mode>3, mul<SVE_F:mode>3, div<SVE_F:mode>3)
(<SVE_COND_FP_MAXMIN_PUBLIC:optab><SVE_F:mode>3)
(<SVE_COND_FP_MAXMIN_PUBLIC:maxmin_uns><SVE_F:mode>3)
(<SVE_COND_FP_TERNARY:optab><SVE_F:mode>4): Add an SVE_RELAXED_GP
operand.
(cond_<SVE_COND_FP_BINARY:optab><SVE_F:mode>)
(cond_<SVE_COND_FP_TERNARY:optab><SVE_F:mode>): Add an SVE_STRICT_GP
operand.
(*<SVE_COND_FP_UNARY:optab><SVE_F:mode>2)
(*cond_<SVE_COND_FP_BINARY:optab><SVE_F:mode>_2)
(*cond_<SVE_COND_FP_BINARY:optab><SVE_F:mode>_3)
(*cond_<SVE_COND_FP_BINARY:optab><SVE_F:mode>_any)
(*fabd<SVE_F:mode>3, *div<SVE_F:mode>3)
(*<SVE_COND_FP_MAXMIN_PUBLIC:optab><SVE_F:mode>3)
(*<SVE_COND_FP_TERNARY:optab><SVE_F:mode>4)
(*cond_<SVE_COND_FP_TERNARY:optab><SVE_F:mode>_2)
(*cond_<SVE_COND_FP_TERNARY:optab><SVE_F:mode>_4)
(*cond_<SVE_COND_FP_TERNARY:optab><SVE_F:mode>_any): Match the
strictness operands. Use aarch64_sve_pred_dominates_p to check
whether the predicate on the conditional operation is suitable
for merging. Split patterns into the canonical equal-predicate form.
(*add<SVE_F:mode>3, *sub<SVE_F:mode>3, *mul<SVE_F:mode>3): Likewise.
Restrict the unpredicated alternatives to SVE_RELAXED_GP.
Co-Authored-By: Kugan Vivekanandarajah <kuganv@linaro.org>
From-SVN: r274418
|
|
This patch reworks the rtl representation of the SVE PTEST operation
so that:
- the governing predicate is always VNx16BI (and so all bits are defined)
- it is still possible to pattern-match the governing predicate in the
mode that it had previously
- a new hint operand says whether the governing predicate is known to be
all true for the element size of interest, rather than this being part
of the unspec name.
These changes make it easier to handle more flag-setting instructions
as part of the ACLE work.
See the comment in aarch64-sve.md for more details.
2019-08-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_ptrue_all): Declare.
* config/aarch64/aarch64.c (aarch64_ptrue_all): New function.
* config/aarch64/aarch64.md (UNSPEC_PTEST_PTRUE): Delete.
(UNSPEC_PTEST): New unspec.
(SVE_MAYBE_NOT_PTRUE, SVE_KNOWN_PTRUE): New constants.
* config/aarch64/iterators.md (data_bytes): New mode attribute.
* config/aarch64/predicates.md (aarch64_sve_ptrue_flag): New predicate.
* config/aarch64/aarch64-sve.md: Add a new section describing the
handling of UNSPEC_PTEST.
(pred_<LOGICAL:optab><PRED_ALL:mode>3): Rename to...
(@aarch64_pred_<LOGICAL:optab><PRED_ALL:mode>_z): ...this.
(ptest_ptrue<mode>): Replace with...
(aarch64_ptest<mode>): ...this new pattern.
(cbranch<mode>4): Update after above changes.
(*<LOGICAL:optab><PRED_ALL:mode>3_cc): Use UNSPEC_PTEST instead of
UNSPEC_PTEST_PTRUE.
(*cmp<SVE_INT_CMP:cmp_op><SVE_I:mode>_cc): Likewise.
(*cmp<SVE_INT_CMP:cmp_op><SVE_I:mode>_ptest): Likewise.
(*while_ult<GPI:mode><PRED_ALL:mode>_cc): Likewise.
From-SVN: r274414
|
|
If there's no SVE instruction to load a given constant directly, this
patch instead tries to use an Advanced SIMD constant move and then
duplicates the constant to fill an SVE vector. The main use of this
is to support constants in which each byte is in { 0, 0xff }.
Also, the patch prefers a simple integer move followed by a duplicate
over a load from memory, like we already do for Advanced SIMD. This is
a useful option to have and would be easy to turn off via a tuning
parameter if necessary.
The patch also extends the handling of wide LD1Rs to big endian,
whereas previously we punted to a full LD1RQ.
2019-08-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* machmode.h (opt_mode::else_mode): New function.
(opt_mode::else_blk): Use it.
* config/aarch64/aarch64-protos.h (aarch64_vq_mode): Declare.
(aarch64_full_sve_mode, aarch64_sve_ld1rq_operand_p): Likewise.
(aarch64_gen_stepped_int_parallel): Likewise.
(aarch64_stepped_int_parallel_p): Likewise.
(aarch64_expand_mov_immediate): Remove the optional gen_vec_duplicate
argument.
* config/aarch64/aarch64.c
(aarch64_expand_sve_widened_duplicate): Delete.
(aarch64_expand_sve_dupq, aarch64_expand_sve_ld1rq): New functions.
(aarch64_expand_sve_const_vector): Rewrite to handle more cases.
(aarch64_expand_mov_immediate): Remove the optional gen_vec_duplicate
argument. Use early returns in the !CONST_INT_P handling.
Pass all SVE data vectors to aarch64_expand_sve_const_vector rather
than handling some inline.
(aarch64_full_sve_mode, aarch64_vq_mode): New functions, split out
from...
(aarch64_simd_container_mode): ...here.
(aarch64_gen_stepped_int_parallel, aarch64_stepped_int_parallel_p)
(aarch64_sve_ld1rq_operand_p): New functions.
* config/aarch64/predicates.md (descending_int_parallel)
(aarch64_sve_ld1rq_operand): New predicates.
* config/aarch64/constraints.md (UtQ): New constraint.
* config/aarch64/aarch64.md (UNSPEC_REINTERPRET): New unspec.
* config/aarch64/aarch64-sve.md (mov<SVE_ALL:mode>): Remove the
gen_vec_duplicate from call to aarch64_expand_mov_immediate.
(@aarch64_sve_reinterpret<mode>): New expander.
(*aarch64_sve_reinterpret<mode>): New pattern.
(@aarch64_vec_duplicate_vq<mode>_le): New pattern.
(@aarch64_vec_duplicate_vq<mode>_be): Likewise.
(*sve_ld1rq<Vesize>): Replace with...
(@aarch64_sve_ld1rq<mode>): ...this new pattern.
gcc/testsuite/
* gcc.target/aarch64/sve/init_2.c: Expect ld1rd to be used
instead of a full vector load.
* gcc.target/aarch64/sve/init_4.c: Likewise.
* gcc.target/aarch64/sve/ld1r_2.c: Remove constants that no longer
need to be loaded from memory.
* gcc.target/aarch64/sve/slp_2.c: Expect the same output for
big and little endian.
* gcc.target/aarch64/sve/slp_3.c: Likewise. Expect 3 of the
doubles to be moved via integer registers rather than loaded
from memory.
* gcc.target/aarch64/sve/slp_4.c: Likewise but for 4 doubles.
* gcc.target/aarch64/sve/spill_4.c: Expect 16-bit constants to be
loaded via an integer register rather than from memory.
* gcc.target/aarch64/sve/const_1.c: New test.
* gcc.target/aarch64/sve/const_2.c: Likewise.
* gcc.target/aarch64/sve/const_3.c: Likewise.
From-SVN: r274375
|
|
Some indexed SVE FCMLA operations have a 3-bit register field that
requires one of Z0-Z7. This patch adds a public "y" constraint for that.
The patch also documents "x", which is again intended to be a public
constraint.
2019-08-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* doc/md.texi: Document the x and y constraints for AArch64.
* config/aarch64/aarch64.h (FP_LO8_REGNUM_P): New macro.
(FP_LO8_REGS): New reg_class.
(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add an entry for FP_LO8_REGS.
* config/aarch64/aarch64.c (aarch64_hard_regno_nregs)
(aarch64_regno_regclass, aarch64_class_max_nregs): Handle FP_LO8_REGS.
* config/aarch64/predicates.md (aarch64_simd_register): Use
FP_REGNUM_P instead of checking the classes manually.
* config/aarch64/constraints.md (y): New constraint.
gcc/testsuite/
* gcc.target/aarch64/asm-x-constraint-1.c: New test.
* gcc.target/aarch64/asm-y-constraint-1.c: Likewise.
From-SVN: r274367
|
|
We used INSR to handle zero integers but not zero floats.
2019-08-07 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/constraints.md (Z): Handle floating-point zeros too.
* config/aarch64/predicates.md (aarch64_reg_or_zero): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/init_13.c: New test.
From-SVN: r274193
|
|
The recent AArch64 absolute difference patterns had to go through
some hoops to pair max/min rtx codes with the same signedness.
I also need to pair signed/unsigned codes with sign/zero extension
for some SVE ACLE patterns.
This patch therefore supports <...> as rtx codes, like we already
do for modes.
2019-05-12 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* doc/md.texi: Document use of code attributes in rtx patterns.
* read-md.h (rtx_reader::rtx_alloc_for_name): New member function.
* read-rtl.c (find_code): Split out search loops into...
(maybe_find_code): ...this new function.
(check_code_iterator): Make the error message more informative.
(check_code_attribute): New function.
(rtx_reader::rtx_alloc_for_name): Likewise.
(rtx_reader::read_rtx_code): Use rtx_alloc_for_name.
* config/aarch64/predicates.md (aarch64_smin, aarch64_umin): Delete.
* config/aarch64/aarch64-simd.md (*aarch64_<su>abd<mode>_3): Use
<max_opp> directly as an rtx code instead of via a match_operator.
* config/aarch64/aarch64-sve.md (aarch64_<su>abd<mode>_3): Likewise.
(<su>abd<mode>_3): Update accordingly.
From-SVN: r271107
|
|
In general the stack pointer was not handled for many SUBS/ADDS patterns in
aarch64.md.
Both the "extended register" and "immediate" forms allow the stack pointer to be
used as the source register, while no form allows the stack pointer for the
destination register.
The define_insn patterns generating ADDS/SUBS did not allow the stack pointer
for any operand, while the define_peephole2 patterns that generated RTX to be
matched by these patterns allowed the stack pointer for any operand.
The patterns are fixed by adding the 'k' constraint for the first source operand
to all define_insns that generate the ADDS/SUBS "extended register" and
"immediate" forms (but not the "shifted register" form).
In peephole optimizations, constraint strings are ignored (see "(gccint) C
Constraint Interface" info node in the documentation), so the decision to act or
not is based solely on the predicate and condition.
This patch introduces a new predicate "aarch64_general_reg" to be used in
define_peephole2 patterns where only GENERAL_REGS registers are acceptable and
uses that predicate in the peepholes that generate patterns for ADDS/SUBS.
Full bootstrap and regtest done on aarch64-none-linux-gnu.
Regression tests done on aarch64-none-linux-gnu and aarch64-none-elf cross
compiler.
OK for trunk?
gcc/ChangeLog:
2019-02-22 Matthew Malcomson <matthew.malcomson@arm.com>
PR target/89324
* config/aarch64/aarch64.md: Use aarch64_general_reg predicate on
destination register in peepholes generating patterns for ADDS/SUBS.
(add<mode>3_compare0,
*addsi3_compare0_uxtw, add<mode>3_compareC,
add<mode>3_compareV_imm, add<mode>3_compareV,
*adds_<optab><ALLX:mode>_<GPI:mode>,
*subs_<optab><ALLX:mode>_<GPI:mode>,
*adds_<optab><ALLX:mode>_shift_<GPI:mode>,
*subs_<optab><ALLX:mode>_shift_<GPI:mode>,
*adds_<optab><mode>_multp2, *subs_<optab><mode>_multp2,
*sub<mode>3_compare0, *subsi3_compare0_uxtw,
sub<mode>3_compare1): Allow stack pointer for source register.
* config/aarch64/predicates.md (aarch64_general_reg): New predicate.
gcc/testsuite/ChangeLog:
2019-02-22 Matthew Malcomson <matthew.malcomson@arm.com>
PR target/89324
* gcc.dg/rtl/aarch64/subs_adds_sp.c: New test.
* gfortran.fortran-torture/compile/pr89324.f90: New test.
From-SVN: r269122
|
|
Richard raised a concern about the RTL we use to represent the AdvSIMD SABD
(vector signed absolute difference) instruction.
We currently represent it as ABS (MINUS op1 op2).
This isn't exactly what SABD does. ABS treats its input as a signed value
and returns the absolute of that.
For example:
(sabd:QI 64 -128) == 192 (unsigned) aka -64 (signed)
whereas
(minus:QI 64 -128) == 192 (unsigned) aka -64 (signed), (abs ...) of that is 64.
A better way to describe the instruction is with MINUS (SMAX (op1 op2) SMIN (op1 op2)).
This patch implements that, and also implements similar semantics for the UABD instruction
that uses UMAX and UMIN.
That way for the example above we'll have:
(minus:QI (smax:QI (64 -128)) (smin:QI (64 -128))) == (minus:QI 64 -128) == 192 (or -64 signed) which matches
what SABD does.
* config/aarch64/iterators.md (max_opp): New code_attr.
(USMAX): New code iterator.
* config/aarch64/predicates.md (aarch64_smin): New predicate.
(aarch64_smax): Likewise.
* config/aarch64/aarch64-simd.md (abd<mode>_3): Rename to...
(*aarch64_<su>abd<mode>_3): ... Change RTL representation to
MINUS (MAX MIN).
* gcc.target/aarch64/abd_1.c: New test.
* gcc.dg/sabd_1.c: Likewise.
From-SVN: r268658
|
|
Further investigation showed that my previous patch for this issue was
still incomplete.
The problem stemmed from what I suspect was a mis-understanding of the
way overflow is calculated on aarch64 when values are subtracted (and
hence in comparisons). In this case, unlike addition, the carry flag
is /cleared/ if there is overflow (technically, underflow) and set
when that does not happen. This patch clears up this issue by using
CCmode for all subtractive operations (this can fully describe the
normal overflow conditions without anything particularly fancy);
clears up the way we express normal unsigned overflow using CC_Cmode
(the result of a sum is less than one of the operands) and adds a new
mode, CC_ADCmode to handle expressing overflow of an add-with-carry
operation, where the standard idiom is no-longer sufficient to
describe the overflow condition.
PR target/86891
* config/aarch64/aarch64-modes.def: Add comment about how the carry
bit is set by add and compare.
(CC_ADC): New CC_MODE.
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Use variables
to cache the code and mode of X. Adjust the shape of a CC_Cmode
comparison. Add detection for CC_ADCmode.
(aarch64_get_condition_code_1): Update code support for CC_Cmode. Add
CC_ADCmode.
* config/aarch64/aarch64.md (uaddv<mode>4): Use LTU with CCmode.
(uaddvti4): Comparison result is in CC_ADCmode and the condition is GEU.
(add<mode>3_compareC_cconly_imm): Delete. Merge into...
(add<mode>3_compareC_cconly): ... this. Restructure the comparison
to eliminate the need for zero-extending the operands.
(add<mode>3_compareC_imm): Delete. Merge into ...
(add<mode>3_compareC): ... this. Restructure the comparison to
eliminate the need for zero-extending the operands.
(add<mode>3_carryin): Use LTU for the overflow detection.
(add<mode>3_carryinC): Use CC_ADCmode for the result of the carry out.
Reexpress comparison for overflow.
(add<mode>3_carryinC_zero): Update for change to add<mode>3_carryinC.
(add<mode>3_carryinC): Likewise.
(add<mode>3_carryinV): Use LTU for carry between partials.
* config/aarch64/predicates.md (aarch64_carry_operation): Update
handling of CC_Cmode and add CC_ADCmode.
(aarch64_borrow_operation): Likewise.
From-SVN: r267971
|
|
From-SVN: r267494
|
|
Do not zero-extend the input to the cas for subword operations;
instead, use the appropriate zero-extending compare insns.
Correct the predicates and constraints for immediate expected operand.
* config/aarch64/aarch64.c (aarch64_gen_compare_reg_maybe_ze): New.
(aarch64_split_compare_and_swap): Use it.
(aarch64_expand_compare_and_swap): Likewise. Remove convert_modes;
test oldval against the proper predicate.
* config/aarch64/atomics.md (@atomic_compare_and_swap<ALLI>):
Use nonmemory_operand for expected.
(cas_short_expected_pred): New.
(@aarch64_compare_and_swap<SHORT>): Use it; use "rn" not "rI" to match.
(@aarch64_compare_and_swap<GPI>): Use "rn" not "rI" for expected.
* config/aarch64/predicates.md (aarch64_plushi_immediate): New.
(aarch64_plushi_operand): New.
From-SVN: r265657
|
|
Use the STLUR instruction introduced in Armv8.4-a.
This instruction has the store-release semantic like STLR but can take a
9-bit unscaled signed immediate offset.
Example test case:
```
void
foo ()
{
int32_t *atomic_vals = calloc (4, sizeof (int32_t));
atomic_store_explicit (atomic_vals + 1, 2, memory_order_release);
}
```
Before patch generates
```
foo:
stp x29, x30, [sp, -16]!
mov x1, 4
mov x0, x1
mov x29, sp
bl calloc
mov w1, 2
add x0, x0, 4
stlr w1, [x0]
ldp x29, x30, [sp], 16
ret
```
After patch generates
```
foo:
stp x29, x30, [sp, -16]!
mov x1, 4
mov x0, x1
mov x29, sp
bl calloc
mov w1, 2
stlur w1, [x0, 4]
ldp x29, x30, [sp], 16
ret
```
We introduce a new feature flag to indicate the presence of this instruction.
The feature flag is called AARCH64_ISA_RCPC8_4 and is included when targeting
armv8.4 architecture.
We also introduce an "arch" attribute to be checked called "rcpc8_4" after this
feature flag.
gcc/
2018-09-19 Matthew Malcomson <matthew.malcomson@arm.com>
* config/aarch64/aarch64-protos.h
(aarch64_offset_9bit_signed_unscaled_p): New declaration.
* config/aarch64/aarch64.md (arches): New "rcpc8_4" attribute value.
(arch_enabled): Add check for "rcpc8_4" attribute value of "arch".
* config/aarch64/aarch64.h (AARCH64_FL_RCPC8_4): New bitfield.
(AARCH64_FL_FOR_ARCH8_4): Include AARCH64_FL_RCPC8_4.
(AARCH64_FL_PROFILE): Move index so flags are ordered.
(AARCH64_ISA_RCPC8_4): New flag.
* config/aarch64/aarch64.c (offset_9bit_signed_unscaled_p): Renamed
to aarch64_offset_9bit_signed_unscaled_p.
* config/aarch64/atomics.md (atomic_store<mode>): Allow offset
and use stlur.
* config/aarch64/constraints.md (Ust): New constraint.
* config/aarch64/predicates.md.
(aarch64_9bit_offset_memory_operand): New predicate.
(aarch64_rcpc_memory_operand): New predicate.
gcc/testsuite/
2018-09-19 Matthew Malcomson <matthew.malcomson@arm.com>
* gcc.target/aarch64/atomic-store.c: New.
From-SVN: r264421
|
|
pair lanes
gcc/ChangeLog
2018-07-19 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR target/83009
* config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): Make
address check not strict.
gcc/testsuite/ChangeLog
2018-07-19 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR target/83009
* gcc/target/aarch64/store_v2vec_lanes.c: Add extra tests.
From-SVN: r262881
|
|
gcc/ChangeLog
2018-07-19 Andre Vieira <andre.simoesdiasvieira@arm.com>
* config/aarch64/aarch64-simd.md (aarch64_simd_mov<VQ:mode>): Replace
Umq with Umn.
(store_pair_lanes<mode>): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_addr_query_type): Add new
enum value 'ADDR_QUERY_LDP_STP_N'.
* config/aarch64/aarch64.c (aarch64_addr_query_type): Likewise.
(aarch64_print_address_internal): Add declaration.
(aarch64_print_ldpstp_address): Remove.
(aarch64_classify_address): Adapt mode for 'ADDR_QUERY_LDP_STP_N'.
(aarch64_print_operand): Change printing of 'y'.
* config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): Use
new enum value 'ADDR_QUERY_LDP_STP_N', don't hardcode mode and use
'true' rather than '1'.
* gcc/config/aarch64/constraints.md (Uml): Likewise.
(Uml): Rename to Umn.
(Umq): Remove.
From-SVN: r262880
|
|
* config/aarch64/aarch64-protos.h, config/aarch64/aarch64.c
(aarch64_sve_prepare_conditional_op): Remove.
* config/aarch64/aarch64-sve.md (cond_<SVE_INT_BINARY><SVE_I>):
Allow aarch64_simd_reg_or_zero as select operand; remove
the aarch64_sve_prepare_conditional_op call.
(cond_<SVE_INT_BINARY_SD><SVE_SDI>): Likewise.
(cond_<SVE_COND_FP_BINARY><SVE_F>): Likewise.
(*cond_<SVE_INT_BINARY><SVE_I>_z): New pattern.
(*cond_<SVE_INT_BINARY_SD><SVE_SDI>_z): New pattern.
(*cond_<SVE_COND_FP_BINARY><SVE_F>_z): New pattern.
(*cond_<SVE_INT_BINARY><SVE_I>_any): New pattern.
(*cond_<SVE_INT_BINARY_SD><SVE_SDI>_any): New pattern.
(*cond_<SVE_COND_FP_BINARY><SVE_F>_any): New pattern
and a splitters to match all of the *_any patterns.
* config/aarch64/predicates.md (aarch64_sve_any_binary_operator): New.
* config/aarch64/iterators.md (SVE_INT_BINARY_REV): Remove.
(SVE_COND_FP_BINARY_REV): Remove.
(sve_int_op_rev, sve_fp_op_rev): New.
* config/aarch64/aarch64-sve.md (*cond_<SVE_INT_BINARY><SVE_I>_0): New.
(*cond_<SVE_INT_BINARY_SD><SVE_SDI>_0): New.
(*cond_<SVE_COND_FP_BINARY><SVE_F>_0): New.
(*cond_<SVE_INT_BINARY><SVE_I>_2): Rename, add movprfx alternative.
(*cond_<SVE_INT_BINARY_SD><SVE_SDI>_2): Similarly.
(*cond_<SVE_COND_FP_BINARY><SVE_F>_2): Similarly.
(*cond_<SVE_INT_BINARY><SVE_I>_3): Similarly; use sve_int_op_rev.
(*cond_<SVE_INT_BINARY_SD><SVE_SDI>_3): Similarly.
(*cond_<SVE_COND_FP_BINARY><SVE_F>_3): Similarly; use sve_fp_op_rev.
* config/aarch64/aarch64-sve.md (cond_<SVE_COND_FP_BINARY><SVE_F>):
Remove match_dup 1 from the inner unspec.
(*cond_<SVE_COND_FP_BINARY><SVE_F>): Likewise.
* config/aarch64/aarch64.md (movprfx): New attr.
(length): Default movprfx to 8.
* config/aarch64/aarch64-sve.md (*mul<SVE_I>3): Add movprfx alt.
(*madd<SVE_I>, *msub<SVE_I): Likewise.
(*<su>mul<SVE_I>3_highpart): Likewise.
(*<SVE_INT_BINARY_SD><SVE_SDI>3): Likewise.
(*v<ASHIFT><SVE_I>3): Likewise.
(*<su><MAXMIN><SVE_I>3): Likewise.
(*<su><MAXMIN><SVE_F>3): Likewise.
(*fma<SVE_F>4, *fnma<SVE_F>4): Likewise.
(*fms<SVE_F>4, *fnms<SVE_F>4): Likewise.
(*div<SVE_F>4): Likewise.
From-SVN: r262312
|
|
gcc
2018-05-30 Andre Vieira <andre.simoesdiasvieira@arm.com>
2018-05-24 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR target/83009
Revert:
* config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): Make
address check not strict.
gcc/testsuite
2018-05-30 Andre Vieira <andre.simoesdiasvieira@arm.com>
2018-05-24 Andre Vieira <andre.simoesdiasvieira@arm.com>
Revert
PR target/83009
* gcc/target/aarch64/store_v2vec_lanes.c: Add extra tests.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@260635 138bc75d-0d04-0410-961f-82ee72b054a4
From-SVN: r260957
|
|
The operand constraint for the memory address of store/load pair lanes was
enforcing strictly hardware registers be allowed as memory addresses. We want
to relax that such that these patterns can be used by combine. During register
allocation the register constraint will enforce the correct register is chosen.
gcc
2018-05-24 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR target/83009
* config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): Make
address check not strict.
gcc/testsuite
2018-05-24 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR target/83009
* gcc/target/aarch64/store_v2vec_lanes.c: Add extra tests.
From-SVN: r260635
|
|
This patch merges loads and stores from D-registers that are of different modes.
Code like this:
typedef int __attribute__((vector_size(8))) vec;
struct pair
{
vec v;
double d;
}
Now generates a store pair instruction:
void
assign (struct pair *p, vec v)
{
p->v = v;
p->d = 1.0;
}
Whereas previously it generated two `str` instructions.
This patch also merges storing of double zero values with
long integer values:
struct pair
{
long long l;
double d;
}
void
foo (struct pair *p)
{
p->l = 10;
p->d = 0.0;
}
Now generates a single store pair instruction rather than two `str` instructions.
The patch basically generalises the mode iterators on the patterns in aarch64.md
and the peepholes in aarch64-ldpstp.md to take all combinations of pairs of modes
so, while it may be a large-ish patch, it does fairly mechanical stuff.
2018-05-22 Jackson Woodruff <jackson.woodruff@arm.com>
Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* config/aarch64/aarch64.md: New patterns to generate stp
and ldp.
(store_pair_sw, store_pair_dw): New patterns to generate stp for
single words and double words.
(load_pair_sw, load_pair_dw): Likewise.
(store_pair_sf, store_pair_df, store_pair_si, store_pair_di):
Delete.
(load_pair_sf, load_pair_df, load_pair_si, load_pair_di):
Delete.
* config/aarch64/aarch64-ldpstp.md: Modify peephole
for different mode ldpstp and add peephole for merged zero stores.
Likewise for loads.
* config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp):
Add size check.
(aarch64_gen_store_pair): Rename calls to match new patterns.
(aarch64_gen_load_pair): Rename calls to match new patterns.
* config/aarch64/aarch64-simd.md (load_pair<mode>): Rename to...
(load_pair<DREG:mode><DREG2:mode>): ... This.
(store_pair<mode>): Rename to...
(vec_store_pair<DREG:mode><DREG2:mode>): ... This.
* config/aarch64/iterators.md (DREG, DREG2, DX2, SX, SX2, DSX):
New mode iterators.
(V_INT_EQUIV): Handle SImode.
* config/aarch64/predicates.md (aarch64_reg_zero_or_fp_zero):
New predicate.
* gcc.target/aarch64/ldp_stp_6.c: New.
* gcc.target/aarch64/ldp_stp_7.c: New.
* gcc.target/aarch64/ldp_stp_8.c: New.
Co-Authored-By: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
From-SVN: r260538
|
|
PR fortran/84565
* config/aarch64/predicates.md (aarch64_simd_reg_or_zero): Use
aarch64_simd_or_scalar_imm_zero rather than aarch64_simd_imm_zero.
* gfortran.dg/pr84565.f90: New test.
From-SVN: r258333
|
|
Subreg reads should be equivalent to storing the inner register to
memory and loading the appropriate memory bytes back, with subreg
writes doing the reverse. For the reasons explained in the comments,
this isn't what happens for big-endian SVE if we simply reinterpret
one vector register as having a different element size, so the
conceptual store and load is needed in the general case.
However, that obviously produces poor code if we do it too often.
The patch therefore adds a pattern for handling the operation in
registers. This copes with the important case of a VIEW_CONVERT
created by tree-vect-slp.c:duplicate_and_interleave.
It might make sense to tighten the predicates in aarch64-sve.md so
that such subregs are not allowed as operands to most instructions,
but that's future work.
This fixes the sve/slp_*.c tests on aarch64_be.
2018-02-01 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_split_sve_subreg_move)
(aarch64_maybe_expand_sve_subreg_move): Declare.
* config/aarch64/aarch64.md (UNSPEC_REV_SUBREG): New unspec.
* config/aarch64/predicates.md (aarch64_any_register_operand): New
predicate.
* config/aarch64/aarch64-sve.md (mov<mode>): Optimize subreg moves
that are semantically a reverse operation.
(*aarch64_sve_mov<mode>_subreg_be): New pattern.
* config/aarch64/aarch64.c (aarch64_maybe_expand_sve_subreg_move):
(aarch64_replace_reg_mode, aarch64_split_sve_subreg_move): New
functions.
(aarch64_can_change_mode_class): For big-endian, forbid changes
between two SVE modes if they have different element sizes.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
From-SVN: r257289
|
|
This fixes PR82964 which reports ICEs for some CONST_WIDE_INT immediates.
It turns out decimal floating point CONST_DOUBLE get changed into
CONST_WIDE_INT without checking the constraint on the operand, which
results in failures. Avoid this by only allowing SF/DF/TF mode floating
point constants in aarch64_legitimate_constant_p. A similar issue can
occur with 128-bit immediates which may be emitted even when disallowed
in aarch64_legitimate_constant_p, and the constraints in movti_aarch64
don't match. Fix this with a new constraint and allowing valid immediates
in aarch64_legitimate_constant_p.
Rather than allowing all 128-bit immediates and expanding in up to 8
MOV/MOVK instructions, limit them to 4 instructions and use a literal
load for other cases. Improve a few TImode tests to use a literal and
ensure they are skipped with -fpic.
This fixes all reported failures.
gcc/
PR target/82964
* config/aarch64/aarch64.md (movti_aarch64): Use Uti constraint.
* config/aarch64/aarch64.c (aarch64_mov128_immediate): New function.
(aarch64_legitimate_constant_p): Just support CONST_DOUBLE
SF/DF/TF mode to avoid creating illegal CONST_WIDE_INT immediates.
* config/aarch64/aarch64-protos.h (aarch64_mov128_immediate):
Add declaration.
* config/aarch64/constraints.md (aarch64_movti_operand):
Limit immediates.
* config/aarch64/predicates.md (Uti): Add new constraint.
gcc/testsuite/
PR target/79041
PR target/82964
* gcc.target/aarch64/pr79041-2.c: Improve test, disable with fpic.
* gcc.target/aarch64/pr78733.c: Improve test, disable with fpic.
Co-Authored-By: Richard Sandiford <richard.sandiford@linaro.org>
From-SVN: r256800
|
|
This patch adds support for SVE gather loads. It uses the basically
the same analysis code as the AVX gather support, but after that
there are two major differences:
- It uses new internal functions rather than target built-ins.
The interface is:
IFN_GATHER_LOAD (base, offsets scale)
IFN_MASK_GATHER_LOAD (base, offsets scale, mask)
which should be reasonably generic. One of the advantages of
using internal functions is that other passes can understand what
the functions do, but a more immediate advantage is that we can
query the underlying target pattern to see which scales it supports.
- It uses pattern recognition to convert the offset to the right width,
if it was originally narrower than that. This avoids having to do
a widening operation as part of the gather expansion itself.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (gather_load@var{m}): Document.
(mask_gather_load@var{m}): Likewise.
* genopinit.c (main): Add supports_vec_gather_load and
supports_vec_gather_load_cached to target_optabs.
* optabs-tree.c (init_tree_optimization_optabs): Use
ggc_cleared_alloc to allocate target_optabs.
* optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs.
* internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal
functions.
* internal-fn.h (internal_load_fn_p): Declare.
(internal_gather_scatter_fn_p): Likewise.
(internal_fn_mask_index): Likewise.
(internal_gather_scatter_fn_supported_p): Likewise.
* internal-fn.c (gather_load_direct): New macro.
(expand_gather_load_optab_fn): New function.
(direct_gather_load_optab_supported_p): New macro.
(direct_internal_fn_optab): New function.
(internal_load_fn_p): Likewise.
(internal_gather_scatter_fn_p): Likewise.
(internal_fn_mask_index): Likewise.
(internal_gather_scatter_fn_supported_p): Likewise.
* optabs-query.c (supports_at_least_one_mode_p): New function.
(supports_vec_gather_load_p): Likewise.
* optabs-query.h (supports_vec_gather_load_p): Declare.
* tree-vectorizer.h (gather_scatter_info): Add ifn, element_type
and memory_type field.
(NUM_PATTERNS): Bump to 15.
* tree-vect-data-refs.c: Include internal-fn.h.
(vect_gather_scatter_fn_p): New function.
(vect_describe_gather_scatter_call): Likewise.
(vect_check_gather_scatter): Try using internal functions for
gather loads. Recognize existing calls to a gather load function.
(vect_analyze_data_refs): Consider using gather loads if
supports_vec_gather_load_p.
* tree-vect-patterns.c (vect_get_load_store_mask): New function.
(vect_get_gather_scatter_offset_type): Likewise.
(vect_convert_mask_for_vectype): Likewise.
(vect_add_conversion_to_patterm): Likewise.
(vect_try_gather_scatter_pattern): Likewise.
(vect_recog_gather_scatter_pattern): New pattern recognizer.
(vect_vect_recog_func_ptrs): Add it.
* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
internal_fn_mask_index and internal_gather_scatter_fn_p.
(check_load_store_masking): Take the gather_scatter_info as an
argument and handle gather loads.
(vect_get_gather_scatter_ops): New function.
(vectorizable_call): Check internal_load_fn_p.
(vectorizable_load): Likewise. Handle gather load internal
functions.
(vectorizable_store): Update call to check_load_store_masking.
* config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec.
* config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators.
* config/aarch64/predicates.md (aarch64_gather_scale_operand_w)
(aarch64_gather_scale_operand_d): New predicates.
* config/aarch64/aarch64-sve.md (gather_load<mode>): New expander.
(mask_gather_load<mode>): New insns.
gcc/testsuite/
* gcc.target/aarch64/sve/gather_load_1.c: New test.
* gcc.target/aarch64/sve/gather_load_2.c: Likewise.
* gcc.target/aarch64/sve/gather_load_3.c: Likewise.
* gcc.target/aarch64/sve/gather_load_4.c: Likewise.
* gcc.target/aarch64/sve/gather_load_5.c: Likewise.
* gcc.target/aarch64/sve/gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/gather_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256640
|
|
This patch adds support for SVE LD[234], ST[234] and associated
structure modes. Unlike Advanced SIMD, these modes are extra-long
vector modes instead of integer modes.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-modes.def: Define x2, x3 and x4 vector
modes for SVE.
* config/aarch64/aarch64-protos.h
(aarch64_sve_struct_memory_operand_p): Declare.
* config/aarch64/iterators.md (SVE_STRUCT): New mode iterator.
(vector_count, insn_length, VSINGLE, vsingle): New mode attributes.
(VPRED, vpred): Handle SVE structure modes.
* config/aarch64/constraints.md (Utx): New constraint.
* config/aarch64/predicates.md (aarch64_sve_struct_memory_operand)
(aarch64_sve_struct_nonimmediate_operand): New predicates.
* config/aarch64/aarch64.md (UNSPEC_LDN, UNSPEC_STN): New unspecs.
* config/aarch64/aarch64-sve.md (mov<mode>, *aarch64_sve_mov<mode>_le)
(*aarch64_sve_mov<mode>_be, pred_mov<mode>): New patterns for
structure modes. Split into pieces after RA.
(vec_load_lanes<mode><vsingle>, vec_mask_load_lanes<mode><vsingle>)
(vec_store_lanes<mode><vsingle>, vec_mask_store_lanes<mode><vsingle>):
New patterns.
* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
SVE structure modes.
(aarch64_classify_address): Likewise.
(sizetochar): Move earlier in file.
(aarch64_print_operand): Handle SVE register lists.
(aarch64_array_mode): New function.
(aarch64_sve_struct_memory_operand_p): Likewise.
(TARGET_ARRAY_MODE): Redefine.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_load_lanes):
Return true for SVE too.
* g++.dg/vect/pr36648.cc: XFAIL for variable-length vectors
if load/store lanes are supported.
* gcc.dg/vect/slp-10.c: Likewise.
* gcc.dg/vect/slp-12c.c: Likewise.
* gcc.dg/vect/slp-17.c: Likewise.
* gcc.dg/vect/slp-33.c: Likewise.
* gcc.dg/vect/slp-6.c: Likewise.
* gcc.dg/vect/slp-cond-1.c: Likewise.
* gcc.dg/vect/slp-multitypes-11-big-array.c: Likewise.
* gcc.dg/vect/slp-multitypes-11.c: Likewise.
* gcc.dg/vect/slp-multitypes-12.c: Likewise.
* gcc.dg/vect/slp-perm-5.c: Remove XFAIL for variable-length SVE.
* gcc.dg/vect/slp-perm-6.c: Likewise.
* gcc.dg/vect/slp-perm-9.c: Likewise.
* gcc.dg/vect/slp-reduc-6.c: Remove XFAIL for variable-length vectors.
* gcc.dg/vect/vect-load-lanes-peeling-1.c: Expect an epilogue loop
for variable-length vectors.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256618
|
|
This patch adds support for ARM's Scalable Vector Extension.
The patch just contains the core features that work with the
current vectoriser framework; later patches will add extra
capabilities to both the target-independent code and AArch64 code.
The patch doesn't include:
- support for unwinding frames whose size depends on the vector length
- modelling the effect of __tls_get_addr on the SVE registers
These are handled by later patches instead.
Some notes:
- The copyright years for aarch64-sve.md start at 2009 because some of
the code is based on aarch64.md, which also starts from then.
- The patch inserts spaces between items in the AArch64 section
of sourcebuild.texi. This matches at least the surrounding
architectures and looks a little nicer in the info output.
- aarch64-sve.md includes a pattern:
while_ult<GPI:mode><PRED_ALL:mode>
A later patch adds a matching "while_ult" optab, but the pattern
is also needed by the predicate vec_duplicate expander.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/invoke.texi (-msve-vector-bits=): Document new option.
(sve): Document new AArch64 extension.
* doc/md.texi (w): Extend the description of the AArch64
constraint to include SVE vectors.
(Upl, Upa): Document new AArch64 predicate constraints.
* config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
enum.
* config/aarch64/aarch64.opt (sve_vector_bits): New enum.
(msve-vector-bits=): New option.
* config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
SVE when these are disabled.
(sve): New extension.
* config/aarch64/aarch64-modes.def: Define SVE vector and predicate
modes. Adjust their number of units based on aarch64_sve_vg.
(MAX_BITSIZE_MODE_ANY_MODE): Define.
* config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
aarch64_addr_query_type.
(aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
(aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
(aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
(aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
(aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_check_zero_based_sve_index_immediate): Declare.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): Likewise.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
(aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
(aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
(aarch64_regmode_natural_size): Likewise.
* config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
(AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
left one place.
(AARCH64_ISA_SVE, TARGET_SVE): New macros.
(FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
for VG and the SVE predicate registers.
(V_ALIASES): Add a "z"-prefixed alias.
(FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
(AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
(PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
(PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
(REG_CLASS_NAMES): Add entries for them.
(REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG
and the predicate registers.
(aarch64_sve_vg): Declare.
(BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
(SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
(REGMODE_NATURAL_SIZE): Define.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
SVE macros.
* config/aarch64/aarch64.c: Include cfgrtl.h.
(simd_immediate_info): Add a constructor for series vectors,
and an associated step field.
(aarch64_sve_vg): New variable.
(aarch64_dbx_register_number): Handle VG and the predicate registers.
(aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete.
(VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE)
(VEC_ANY_DATA, VEC_STRUCT): New constants.
(aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p)
(aarch64_classify_vector_mode, aarch64_vector_data_mode_p)
(aarch64_sve_data_mode_p, aarch64_sve_pred_mode)
(aarch64_get_mask_mode): New functions.
(aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS
and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS.
(aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE
predicate modes and predicate registers. Explicitly restrict
GPRs to modes of 16 bytes or smaller. Only allow FP registers
to store a vector mode if it is recognized by
aarch64_classify_vector_mode.
(aarch64_regmode_natural_size): New function.
(aarch64_hard_regno_caller_save_mode): Return the original mode
for predicates.
(aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl)
(aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate)
(aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New
functions.
(aarch64_add_offset): Add a temp2 parameter. Assert that temp1
does not overlap dest if the function is frame-related. Handle
SVE constants.
(aarch64_split_add_offset): New function.
(aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass
them aarch64_add_offset.
(aarch64_allocate_and_probe_stack_space): Add a temp2 parameter
and update call to aarch64_sub_sp.
(aarch64_add_cfa_expression): New function.
(aarch64_expand_prologue): Pass extra temporary registers to the
functions above. Handle the case in which we need to emit new
DW_CFA_expressions for registers that were originally saved
relative to the stack pointer, but now have to be expressed
relative to the frame pointer.
(aarch64_output_mi_thunk): Pass extra temporary registers to the
functions above.
(aarch64_expand_epilogue): Likewise. Prevent inheritance of
IP0 and IP1 values for SVE frames.
(aarch64_expand_vec_series): New function.
(aarch64_expand_sve_widened_duplicate): Likewise.
(aarch64_expand_sve_const_vector): Likewise.
(aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter.
Handle SVE constants. Use emit_move_insn to move a force_const_mem
into the register, rather than emitting a SET directly.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move)
(aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p)
(offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p)
(offset_9bit_signed_scaled_p): New functions.
(aarch64_replicate_bitmask_imm): New function.
(aarch64_bitmask_imm): Use it.
(aarch64_cannot_force_const_mem): Reject expressions involving
a CONST_POLY_INT. Update call to aarch64_classify_symbol.
(aarch64_classify_index): Handle SVE indices, by requiring
a plain register index with a scale that matches the element size.
(aarch64_classify_address): Handle SVE addresses. Assert that
the mode of the address is VOIDmode or an integer mode.
Update call to aarch64_classify_symbol.
(aarch64_classify_symbolic_expression): Update call to
aarch64_classify_symbol.
(aarch64_const_vec_all_in_range_p): New function.
(aarch64_print_vector_float_operand): Likewise.
(aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than
"vN" for FP registers with SVE modes. Handle (const ...) vectors
and the FP immediates 1.0 and 0.5.
(aarch64_print_address_internal): Handle SVE addresses.
(aarch64_print_operand_address): Use ADDR_QUERY_ANY.
(aarch64_regno_regclass): Handle predicate registers.
(aarch64_secondary_reload): Handle big-endian reloads of SVE
data modes.
(aarch64_class_max_nregs): Handle SVE modes and predicate registers.
(aarch64_rtx_costs): Check for ADDVL and ADDPL instructions.
(aarch64_convert_sve_vector_bits): New function.
(aarch64_override_options): Use it to handle -msve-vector-bits=.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode.
Handle SVE vector and predicate modes. Accept VL-based constants
that need only one temporary register, and VL offsets that require
no temporary registers.
(aarch64_conditional_register_usage): Mark the predicate registers
as fixed if SVE isn't available.
(aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode.
Return true for SVE vector and predicate modes.
(aarch64_simd_container_mode): Take the number of bits as a poly_int64
rather than an unsigned int. Handle SVE modes.
(aarch64_preferred_simd_mode): Update call accordingly. Handle
SVE modes.
(aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR
if SVE is enabled.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): New functions.
(aarch64_sve_valid_immediate): New function.
(aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors.
Explicitly reject structure modes. Check for INDEX constants.
Handle PTRUE and PFALSE constants.
(aarch64_check_zero_based_sve_index_immediate): New function.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for
vector modes. Accept constants in the range of CNT[BHWD].
(aarch64_simd_scalar_immediate_valid_for_move): Explicitly
ask for an Advanced SIMD mode.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions.
(aarch64_simd_vector_alignment): Handle SVE predicates.
(aarch64_vectorize_preferred_vector_alignment): New function.
(aarch64_simd_vector_alignment_reachable): Use it instead of
the vector size.
(aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p.
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New
functions.
(MAX_VECT_LEN): Delete.
(expand_vec_perm_d): Add a vec_flags field.
(emit_unspec2, aarch64_expand_sve_vec_perm): New functions.
(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
(aarch64_evpc_ext): Don't apply a big-endian lane correction
for SVE modes.
(aarch64_evpc_rev): Rename to...
(aarch64_evpc_rev_local): ...this. Use a predicated operation for SVE.
(aarch64_evpc_rev_global): New function.
(aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP.
(aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of
MAX_VECT_LEN.
(aarch64_evpc_sve_tbl): New function.
(aarch64_expand_vec_perm_const_1): Update after rename of
aarch64_evpc_rev. Handle SVE permutes too, trying
aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather
than aarch64_evpc_tbl.
(aarch64_vectorize_vec_perm_const): Initialize vec_flags.
(aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code)
(aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int)
(aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or)
(aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond): New functions.
(aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead
of aarch64_vector_mode_p.
(aarch64_dwarf_poly_indeterminate_value): New function.
(aarch64_compute_pressure_classes): Likewise.
(aarch64_can_change_mode_class): Likewise.
(TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine.
(TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise.
(TARGET_VECTORIZE_GET_MASK_MODE): Likewise.
(TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise.
(TARGET_COMPUTE_PRESSURE_CLASSES): Likewise.
(TARGET_CAN_CHANGE_MODE_CLASS): Likewise.
* config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr)
(Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New
constraints.
(Dn, Dl, Dr): Accept const as well as const_vector.
(Dz): Likewise. Compare against CONST0_RTX.
* config/aarch64/iterators.md: Refer to "Advanced SIMD" instead
of "vector" where appropriate.
(SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD)
(SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators.
(UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT)
(UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE)
(UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS)
(UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs.
(Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV)
(v_int_equiv): Extend to SVE modes.
(Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New
mode attributes.
(LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators.
(optab): Handle popcount, smin, smax, umin, umax, abs and sqrt.
(logical_nn, lr, sve_int_op, sve_fp_op): New code attributs.
(LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP)
(SVE_COND_FP_CMP): New int iterators.
(perm_hilo): Handle the new unpack unspecs.
(optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int
attributes.
* config/aarch64/predicates.md (aarch64_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate)
(aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand)
(aarch64_equality_operator, aarch64_constant_vector_operand)
(aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates.
(aarch64_sve_nonimmediate_operand): Likewise.
(aarch64_sve_general_operand): Likewise.
(aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise.
(aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate)
(aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise.
(aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise.
(aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise.
(aarch64_sve_float_arith_immediate): Likewise.
(aarch64_sve_float_arith_with_sub_immediate): Likewise.
(aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise.
(aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise.
(aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise.
(aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise.
(aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise.
(aarch64_sve_float_arith_operand): Likewise.
(aarch64_sve_float_arith_with_sub_operand): Likewise.
(aarch64_sve_float_mul_operand): Likewise.
(aarch64_sve_vec_perm_operand): Likewise.
(aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate.
(aarch64_mov_operand): Accept const_poly_int and const_vector.
(aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const
as well as const_vector.
(aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier
in file. Use CONST0_RTX and CONSTM1_RTX.
(aarch64_simd_or_scalar_imm_zero): Likewise. Add match_codes.
(aarch64_simd_reg_or_zero): Accept const as well as const_vector.
Use aarch64_simd_imm_zero.
* config/aarch64/aarch64-sve.md: New file.
* config/aarch64/aarch64.md: Include it.
(VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers.
(UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE)
(UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI)
(UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK)
(UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants.
(sve): New attribute.
(enabled): Disable instructions with the sve attribute unless
TARGET_SVE.
(movqi, movhi): Pass CONST_POLY_INT operaneds through
aarch64_expand_mov_immediate.
(*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle
CNT[BHSD] immediates.
(movti): Split CONST_POLY_INT moves into two halves.
(add<mode>3): Accept aarch64_pluslong_or_poly_operand.
Split additions that need a temporary here if the destination
is the stack pointer.
(*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates.
(*add<mode>3_poly_1): New instruction.
(set_clobber_cc): New expander.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256612
|
|
2018-01-10 Michael Collison <michael.collison@arm.com>
* config/aarch64/aarch64-modes.def (V2HF): New VECTOR_MODE.
* config/aarch64/aarch64-option-extension.def: Add
AARCH64_OPT_EXTENSION of 'fp16fml'.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
(__ARM_FEATURE_FP16_FML): Define if TARGET_F16FML is true.
* config/aarch64/predicates.md (aarch64_lane_imm3): New predicate.
* config/aarch64/constraints.md (Ui7): New constraint.
* config/aarch64/iterators.md (VFMLA_W): New mode iterator.
(VFMLA_SEL_W): Ditto.
(f16quad): Ditto.
(f16mac1): Ditto.
(VFMLA16_LOW): New int iterator.
(VFMLA16_HIGH): Ditto.
(UNSPEC_FMLAL): New unspec.
(UNSPEC_FMLSL): Ditto.
(UNSPEC_FMLAL2): Ditto.
(UNSPEC_FMLSL2): Ditto.
(f16mac): New code attribute.
* config/aarch64/aarch64-simd-builtins.def
(aarch64_fmlal_lowv2sf): Ditto.
(aarch64_fmlsl_lowv2sf): Ditto.
(aarch64_fmlalq_lowv4sf): Ditto.
(aarch64_fmlslq_lowv4sf): Ditto.
(aarch64_fmlal_highv2sf): Ditto.
(aarch64_fmlsl_highv2sf): Ditto.
(aarch64_fmlalq_highv4sf): Ditto.
(aarch64_fmlslq_highv4sf): Ditto.
(aarch64_fmlal_lane_lowv2sf): Ditto.
(aarch64_fmlsl_lane_lowv2sf): Ditto.
(aarch64_fmlal_laneq_lowv2sf): Ditto.
(aarch64_fmlsl_laneq_lowv2sf): Ditto.
(aarch64_fmlalq_lane_lowv4sf): Ditto.
(aarch64_fmlsl_lane_lowv4sf): Ditto.
(aarch64_fmlalq_laneq_lowv4sf): Ditto.
(aarch64_fmlsl_laneq_lowv4sf): Ditto.
(aarch64_fmlal_lane_highv2sf): Ditto.
(aarch64_fmlsl_lane_highv2sf): Ditto.
(aarch64_fmlal_laneq_highv2sf): Ditto.
(aarch64_fmlsl_laneq_highv2sf): Ditto.
(aarch64_fmlalq_lane_highv4sf): Ditto.
(aarch64_fmlsl_lane_highv4sf): Ditto.
(aarch64_fmlalq_laneq_highv4sf): Ditto.
(aarch64_fmlsl_laneq_highv4sf): Ditto.
* config/aarch64/aarch64-simd.md:
(aarch64_fml<f16mac1>l<f16quad>_low<mode>): New pattern.
(aarch64_fml<f16mac1>l<f16quad>_high<mode>): Ditto.
(aarch64_simd_fml<f16mac1>l<f16quad>_low<mode>): Ditto.
(aarch64_simd_fml<f16mac1>l<f16quad>_high<mode>): Ditto.
(aarch64_fml<f16mac1>l_lane_lowv2sf): Ditto.
(aarch64_fml<f16mac1>l_lane_highv2sf): Ditto.
(aarch64_simd_fml<f16mac>l_lane_lowv2sf): Ditto.
(aarch64_simd_fml<f16mac>l_lane_highv2sf): Ditto.
(aarch64_fml<f16mac1>lq_laneq_lowv4sf): Ditto.
(aarch64_fml<f16mac1>lq_laneq_highv4sf): Ditto.
(aarch64_simd_fml<f16mac>lq_laneq_lowv4sf): Ditto.
(aarch64_simd_fml<f16mac>lq_laneq_highv4sf): Ditto.
(aarch64_fml<f16mac1>l_laneq_lowv2sf): Ditto.
(aarch64_fml<f16mac1>l_laneq_highv2sf): Ditto.
(aarch64_simd_fml<f16mac>l_laneq_lowv2sf): Ditto.
(aarch64_simd_fml<f16mac>l_laneq_highv2sf): Ditto.
(aarch64_fml<f16mac1>lq_lane_lowv4sf): Ditto.
(aarch64_fml<f16mac1>lq_lane_highv4sf): Ditto.
(aarch64_simd_fml<f16mac>lq_lane_lowv4sf): Ditto.
(aarch64_simd_fml<f16mac>lq_lane_highv4sf): Ditto.
* config/aarch64/arm_neon.h (vfmlal_low_u32): New intrinsic.
(vfmlsl_low_u32): Ditto.
(vfmlalq_low_u32): Ditto.
(vfmlslq_low_u32): Ditto.
(vfmlal_high_u32): Ditto.
(vfmlsl_high_u32): Ditto.
(vfmlalq_high_u32): Ditto.
(vfmlslq_high_u32): Ditto.
(vfmlal_lane_low_u32): Ditto.
(vfmlsl_lane_low_u32): Ditto.
(vfmlal_laneq_low_u32): Ditto.
(vfmlsl_laneq_low_u32): Ditto.
(vfmlalq_lane_low_u32): Ditto.
(vfmlslq_lane_low_u32): Ditto.
(vfmlalq_laneq_low_u32): Ditto.
(vfmlslq_laneq_low_u32): Ditto.
(vfmlal_lane_high_u32): Ditto.
(vfmlsl_lane_high_u32): Ditto.
(vfmlal_laneq_high_u32): Ditto.
(vfmlsl_laneq_high_u32): Ditto.
(vfmlalq_lane_high_u32): Ditto.
(vfmlslq_lane_high_u32): Ditto.
(vfmlalq_laneq_high_u32): Ditto.
(vfmlslq_laneq_high_u32): Ditto.
* config/aarch64/aarch64.h (AARCH64_FL_F16SML): New flag.
(AARCH64_FL_FOR_ARCH8_4): New.
(AARCH64_ISA_F16FML): New ISA flag.
(TARGET_F16FML): New feature flag for fp16fml.
(doc/invoke.texi): Document new fp16fml option.
2018-01-10 Michael Collison <michael.collison@arm.com>
* config/aarch64/aarch64-builtins.c:
(aarch64_types_ternopu_imm_qualifiers, TYPES_TERNOPUI): New.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
(__ARM_FEATURE_SHA3): Define if TARGET_SHA3 is true.
* config/aarch64/aarch64.h (AARCH64_FL_SHA3): New flags.
(AARCH64_ISA_SHA3): New ISA flag.
(TARGET_SHA3): New feature flag for sha3.
* config/aarch64/iterators.md (sha512_op): New int attribute.
(CRYPTO_SHA512): New int iterator.
(UNSPEC_SHA512H): New unspec.
(UNSPEC_SHA512H2): Ditto.
(UNSPEC_SHA512SU0): Ditto.
(UNSPEC_SHA512SU1): Ditto.
* config/aarch64/aarch64-simd-builtins.def
(aarch64_crypto_sha512hqv2di): New builtin.
(aarch64_crypto_sha512h2qv2di): Ditto.
(aarch64_crypto_sha512su0qv2di): Ditto.
(aarch64_crypto_sha512su1qv2di): Ditto.
(aarch64_eor3qv8hi): Ditto.
(aarch64_rax1qv2di): Ditto.
(aarch64_xarqv2di): Ditto.
(aarch64_bcaxqv8hi): Ditto.
* config/aarch64/aarch64-simd.md:
(aarch64_crypto_sha512h<sha512_op>qv2di): New pattern.
(aarch64_crypto_sha512su0qv2di): Ditto.
(aarch64_crypto_sha512su1qv2di): Ditto.
(aarch64_eor3qv8hi): Ditto.
(aarch64_rax1qv2di): Ditto.
(aarch64_xarqv2di): Ditto.
(aarch64_bcaxqv8hi): Ditto.
* config/aarch64/arm_neon.h (vsha512hq_u64): New intrinsic.
(vsha512h2q_u64): Ditto.
(vsha512su0q_u64): Ditto.
(vsha512su1q_u64): Ditto.
(veor3q_u16): Ditto.
(vrax1q_u64): Ditto.
(vxarq_u64): Ditto.
(vbcaxq_u16): Ditto.
* config/arm/types.md (crypto_sha512): New type attribute.
(crypto_sha3): Ditto.
(doc/invoke.texi): Document new sha3 option.
2018-01-10 Michael Collison <michael.collison@arm.com>
* config/aarch64/aarch64-builtins.c:
(aarch64_types_quadopu_imm_qualifiers, TYPES_QUADOPUI): New.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
(__ARM_FEATURE_SM3): Define if TARGET_SM4 is true.
(__ARM_FEATURE_SM4): Define if TARGET_SM4 is true.
* config/aarch64/aarch64.h (AARCH64_FL_SM4): New flags.
(AARCH64_ISA_SM4): New ISA flag.
(TARGET_SM4): New feature flag for sm4.
* config/aarch64/aarch64-simd-builtins.def
(aarch64_sm3ss1qv4si): Ditto.
(aarch64_sm3tt1aq4si): Ditto.
(aarch64_sm3tt1bq4si): Ditto.
(aarch64_sm3tt2aq4si): Ditto.
(aarch64_sm3tt2bq4si): Ditto.
(aarch64_sm3partw1qv4si): Ditto.
(aarch64_sm3partw2qv4si): Ditto.
(aarch64_sm4eqv4si): Ditto.
(aarch64_sm4ekeyqv4si): Ditto.
* config/aarch64/aarch64-simd.md:
(aarch64_sm3ss1qv4si): Ditto.
(aarch64_sm3tt<sm3tt_op>qv4si): Ditto.
(aarch64_sm3partw<sm3part_op>qv4si): Ditto.
(aarch64_sm4eqv4si): Ditto.
(aarch64_sm4ekeyqv4si): Ditto.
* config/aarch64/iterators.md (sm3tt_op): New int iterator.
(sm3part_op): Ditto.
(CRYPTO_SM3TT): Ditto.
(CRYPTO_SM3PART): Ditto.
(UNSPEC_SM3SS1): New unspec.
(UNSPEC_SM3TT1A): Ditto.
(UNSPEC_SM3TT1B): Ditto.
(UNSPEC_SM3TT2A): Ditto.
(UNSPEC_SM3TT2B): Ditto.
(UNSPEC_SM3PARTW1): Ditto.
(UNSPEC_SM3PARTW2): Ditto.
(UNSPEC_SM4E): Ditto.
(UNSPEC_SM4EKEY): Ditto.
* config/aarch64/constraints.md (Ui2): New constraint.
* config/aarch64/predicates.md (aarch64_imm2): New predicate.
* config/arm/types.md (crypto_sm3): New type attribute.
(crypto_sm4): Ditto.
* config/aarch64/arm_neon.h (vsm3ss1q_u32): New intrinsic.
(vsm3tt1aq_u32): Ditto.
(vsm3tt1bq_u32): Ditto.
(vsm3tt2aq_u32): Ditto.
(vsm3tt2bq_u32): Ditto.
(vsm3partw1q_u32): Ditto.
(vsm3partw2q_u32): Ditto.
(vsm4eq_u32): Ditto.
(vsm4ekeyq_u32): Ditto.
(doc/invoke.texi): Document new sm4 option.
2018-01-10 Michael Collison <michael.collison@arm.com>
* config/aarch64/aarch64-arches.def (armv8.4-a): New architecture.
* config/aarch64/aarch64.h (AARCH64_ISA_V8_4): New ISA flag.
(AARCH64_FL_FOR_ARCH8_4): New.
(AARCH64_FL_V8_4): New flag.
(doc/invoke.texi): Document new armv8.4-a option.
2018-01-10 Michael Collison <michael.collison@arm.com>
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
(__ARM_FEATURE_AES): Define if TARGET_AES is true.
(__ARM_FEATURE_SHA2): Define if TARGET_SHA2 is true.
* config/aarch64/aarch64-option-extension.def: Add
AARCH64_OPT_EXTENSION of 'sha2'.
(aes): Add AARCH64_OPT_EXTENSION of 'aes'.
(crypto): Disable sha2 and aes if crypto disabled.
(crypto): Enable aes and sha2 if enabled.
(simd): Disable sha2 and aes if simd disabled.
* config/aarch64/aarch64.h (AARCH64_FL_AES, AARCH64_FL_SHA2):
New flags.
(AARCH64_ISA_AES, AARCH64_ISA_SHA2): New ISA flags.
(TARGET_SHA2): New feature flag for sha2.
(TARGET_AES): New feature flag for aes.
* config/aarch64/aarch64-simd.md:
(aarch64_crypto_aes<aes_op>v16qi): Make pattern
conditional on TARGET_AES.
(aarch64_crypto_aes<aesmc_op>v16qi): Ditto.
(aarch64_crypto_sha1hsi): Make pattern conditional
on TARGET_SHA2.
(aarch64_crypto_sha1hv4si): Ditto.
(aarch64_be_crypto_sha1hv4si): Ditto.
(aarch64_crypto_sha1su1v4si): Ditto.
(aarch64_crypto_sha1<sha1_op>v4si): Ditto.
(aarch64_crypto_sha1su0v4si): Ditto.
(aarch64_crypto_sha256h<sha256_op>v4si): Ditto.
(aarch64_crypto_sha256su0v4si): Ditto.
(aarch64_crypto_sha256su1v4si): Ditto.
(doc/invoke.texi): Document new aes and sha2 options.
From-SVN: r256478
|
|
This patch reworks aarch64_simd_valid_immediate so that
it's easier to add SVE support. The main changes are:
- make simd_immediate_info easier to construct
- replace the while (1) { ... break; } blocks with checks that use
the full 64-bit value of the constant
- treat floating-point modes as integers if they aren't valid
as floating-point values
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_output_simd_mov_immediate):
Remove the mode argument.
(aarch64_simd_valid_immediate): Remove the mode and inverse
arguments.
* config/aarch64/iterators.md (bitsize): New iterator.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<mode>, and<mode>3)
(ior<mode>3): Update calls to aarch64_output_simd_mov_immediate.
* config/aarch64/constraints.md (Do, Db, Dn): Update calls to
aarch64_simd_valid_immediate.
* config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Likewise.
(aarch64_reg_or_bic_imm): Likewise.
* config/aarch64/aarch64.c (simd_immediate_info): Replace mvn
with an insn_type enum and msl with a modifier_type enum.
Replace element_width with a scalar_mode. Change the shift
to unsigned int. Add constructors for scalar_float_mode and
scalar_int_mode elements.
(aarch64_vect_float_const_representable_p): Delete.
(aarch64_can_const_movi_rtx_p)
(aarch64_simd_scalar_immediate_valid_for_move)
(aarch64_simd_make_constant): Update call to
aarch64_simd_valid_immediate.
(aarch64_advsimd_valid_immediate_hs): New function.
(aarch64_advsimd_valid_immediate): Likewise.
(aarch64_simd_valid_immediate): Remove mode and inverse
arguments. Rewrite to use the above. Use const_vec_duplicate_p
to detect duplicated constants and use aarch64_float_const_zero_rtx_p
and aarch64_float_const_representable_p on the result.
(aarch64_output_simd_mov_immediate): Remove mode argument.
Update call to aarch64_simd_valid_immediate and use of
simd_immediate_info.
(aarch64_output_scalar_simd_mov_immediate): Update call
accordingly.
gcc/testsuite/
* gcc.target/aarch64/vect-movi.c (movi_float_lsl24): New function.
(main): Call it.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256205
|
|
From-SVN: r256169
|
|
Previously aarch64_classify_address used an rtx code to distinguish
LDP/STP addresses from normal addresses; the code was PARALLEL
to select LDP/STP and anything else to select normal addresses.
This patch replaces that parameter with a dedicated enum.
The SVE port will add another enum value that didn't map naturally
to an rtx code.
2017-12-21 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_addr_query_type): New enum.
(aarch64_legitimate_address_p): Use it instead of an rtx code,
as an optional final parameter.
* config/aarch64/aarch64.c (aarch64_classify_address): Likewise.
(aarch64_legitimate_address_p): Likewise.
(aarch64_print_address_internal): Take an aarch64_addr_query_type
instead of an rtx code.
(aarch64_address_valid_for_prefetch_p): Update calls accordingly.
(aarch64_legitimate_address_hook_p): Likewise.
(aarch64_print_ldpstp_address): Likewise.
(aarch64_print_operand_address): Likewise.
(aarch64_address_cost): Likewise.
* config/aarch64/constraints.md (Uml, Umq, Ump, Utq): Likewise.
* config/aarch64/predicates.md (aarch64_mem_pair_operand): Likewise.
(aarch64_mem_pair_lanes_operand): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255911
|