diff options
author | Richard Sandiford <richard.sandiford@arm.com> | 2022-09-13 09:28:49 +0100 |
---|---|---|
committer | Richard Sandiford <richard.sandiford@arm.com> | 2022-09-13 09:28:49 +0100 |
commit | 721c0fb3aca31d3bf8ad6e929eab32e29a427e60 (patch) | |
tree | b925ba17f10063caf197befa3365547ea7f69dec /gcc/tree-ssa.cc | |
parent | 91061fd5ace2b8ee6bf31bf5f5cbfdf55a25d5e1 (diff) | |
download | gcc-721c0fb3aca31d3bf8ad6e929eab32e29a427e60.zip gcc-721c0fb3aca31d3bf8ad6e929eab32e29a427e60.tar.gz gcc-721c0fb3aca31d3bf8ad6e929eab32e29a427e60.tar.bz2 |
aarch64: Vector move fixes for +nosimd
This patch fixes various issues around the handling of vectors
and (particularly) vector structures with +nosimd. Previously,
passing and returning structures would trigger an ICE, since:
* we didn't allow the structure modes to be stored in FPRs
* we didn't provide +nosimd move patterns
* splitting the moves into word-sized pieces (the default
strategy without move patterns) doesn't work because the
registers are doubleword sized.
The patch is a bit of a hodge-podge since a lot of the handling of
moves, register costs, and register legitimacy is so interconnected.
It didn't seem feasible to split things further.
Some notes:
* The patch recognises vector and tuple modes based on TARGET_FLOAT
rather than TARGET_SIMD, and instead adds TARGET_SIMD to places
that really do need the vector ISA. This is necessary for the
modes to be handled correctly in register arguments and returns.
* The 64-bit (DREG) STP peephole required TARGET_SIMD but the
LDP peephole didn't. I think the LDP one is right, since
DREG moves could involve GPRs as well as FPRs.
* The patch keeps the existing choices of instructions for
TARGET_SIMD, just in case they happen to be better than FMOV
on some uarches.
* Before the patch, +nosimd Q<->Q moves of 128-bit scalars went via
a GPR, thanks to a secondary reload pattern. This approach might
not be ideal, but there's no reason that 128-bit vectors should
behave differently from 128-bit scalars. The patch therefore
extends the current scalar approach to vectors.
* Multi-vector LD1 and ST1 require TARGET_SIMD, so the TARGET_FLOAT
structure moves need to use LDP/STP and LDR/STR combinations
instead. That's also what we do for big-endian even with
TARGET_SIMD, so most of the code was already there. The patterns
for structures of 64-bit vectors are identical, but the patterns
for structures of 128-bit vectors need to cope with the lack of
128-bit Q<->Q moves.
It isn't feasible to move multi-vector tuples via GPRs, so the
patch moves them via memory instead. This contaminates the port
with its first secondary memory reload.
gcc/
* config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Use
TARGET_FLOAT instead of TARGET_SIMD.
(aarch64_vectorize_related_mode): Restrict ADVSIMD handling to
TARGET_SIMD.
(aarch64_hard_regno_mode_ok): Don't allow tuples of 2 64-bit vectors
in GPRs.
(aarch64_classify_address): Treat little-endian structure moves
like big-endian for TARGET_FLOAT && !TARGET_SIMD.
(aarch64_secondary_memory_needed): New function.
(aarch64_secondary_reload): Handle 128-bit Advanced SIMD vectors
in the same way as TF, TI and TD.
(aarch64_rtx_mult_cost): Restrict ADVSIMD handling to TARGET_SIMD.
(aarch64_rtx_costs): Likewise.
(aarch64_register_move_cost): Treat a pair of 64-bit vectors
separately from a single 128-bit vector. Handle the cost implied
by aarch64_secondary_memory_needed.
(aarch64_simd_valid_immediate): Restrict ADVSIMD handling to
TARGET_SIMD.
(aarch64_expand_vec_perm_const_1): Likewise.
(TARGET_SECONDARY_MEMORY_NEEDED): New macro.
* config/aarch64/iterators.md (VTX): New iterator.
* config/aarch64/aarch64.md (arches): Add fp_q as a synonym of simd.
(arch_enabled): Adjust accordingly.
(@aarch64_reload_mov<TX:mode>): Extend to...
(@aarch64_reload_mov<VTX:mode>): ...this.
* config/aarch64/aarch64-simd.md (mov<mode>): Require TARGET_FLOAT
rather than TARGET_SIMD.
(movmisalign<mode>): Likewise.
(load_pair<DREG:mode><DREG2:mode>): Likewise.
(vec_store_pair<DREG:mode><DREG2:mode>): Likewise.
(load_pair<VQ:mode><VQ2:mode>): Likewise.
(vec_store_pair<VQ:mode><VQ2:mode>): Likewise.
(@aarch64_split_simd_mov<mode>): Likewise.
(aarch64_get_low<mode>): Likewise.
(aarch64_get_high<mode>): Likewise.
(aarch64_get_half<mode>): Likewise. Canonicalize to a move for
lowpart extracts.
(*aarch64_simd_mov<VDMOV:mode>): Require TARGET_FLOAT rather than
TARGET_SIMD. Use different w<-w and r<-w instructions for
!TARGET_SIMD. Disable immediate moves for !TARGET_SIMD but
add an alternative specifically for w<-Z.
(*aarch64_simd_mov<VQMOV:mode>): Require TARGET_FLOAT rather than
TARGET_SIMD. Likewise for the associated define_splits. Disable
FPR moves and immediate moves for !TARGET_SIMD but add an alternative
specifically for w<-Z.
(aarch64_simd_mov_from_<mode>high): Require TARGET_FLOAT rather than
TARGET_SIMD. Restrict the existing alternatives to TARGET_SIMD
but add a new r<-w one for !TARGET_SIMD.
(*aarch64_get_high<mode>): New pattern.
(load_pair_lanes<mode>): Require TARGET_FLOAT rather than TARGET_SIMD.
(store_pair_lanes<mode>): Likewise.
(*aarch64_combine_internal<mode>): Likewise. Restrict existing
w<-w, w<-r and w<-m alternatives to TARGET_SIMD but add a new w<-r
alternative for !TARGET_SIMD.
(*aarch64_combine_internal_be<mode>): Likewise.
(aarch64_combinez<mode>): Require TARGET_FLOAT rather than TARGET_SIMD.
Remove bogus arch attribute.
(*aarch64_combinez_be<mode>): Likewise.
(@aarch64_vec_concat<mode>): Require TARGET_FLOAT rather than
TARGET_SIMD.
(aarch64_combine<mode>): Likewise.
(aarch64_rev_reglist<mode>): Likewise.
(mov<mode>): Likewise.
(*aarch64_be_mov<VSTRUCT_2D:mode>): Extend to TARGET_FLOAT &&
!TARGET_SIMD, regardless of endianness. Extend associated
define_splits in the same way, both for this pattern and the
ones below.
(*aarch64_be_mov<VSTRUCT_2Qmode>): Likewise. Restrict w<-w
alternative to TARGET_SIMD.
(*aarch64_be_movoi): Likewise.
(*aarch64_be_movci): Likewise.
(*aarch64_be_movxi): Likewise.
(*aarch64_be_mov<VSTRUCT_4QD:mode>): Extend to TARGET_FLOAT
&& !TARGET_SIMD, regardless of endianness. Restrict w<-w alternative
to TARGET_SIMD for tuples of 128-bit vectors.
(*aarch64_be_mov<VSTRUCT_4QD:mode>): Likewise.
* config/aarch64/aarch64-ldpstp.md: Remove TARGET_SIMD condition
from DREG STP peephole. Change TARGET_SIMD to TARGET_FLOAT in
the VQ and VP_2E LDP and STP peepholes.
gcc/testsuite/
* gcc.target/aarch64/ldp_stp_20.c: New test.
* gcc.target/aarch64/ldp_stp_21.c: Likewise.
* gcc.target/aarch64/ldp_stp_22.c: Likewise.
* gcc.target/aarch64/ldp_stp_23.c: Likewise.
* gcc.target/aarch64/ldp_stp_24.c: Likewise.
* gcc.target/aarch64/movv16qi_1.c (gpr_to_gpr): New function.
* gcc.target/aarch64/movv8qi_1.c (gpr_to_gpr): Likewise.
* gcc.target/aarch64/movv16qi_2.c: New test.
* gcc.target/aarch64/movv16qi_3.c: Likewise.
* gcc.target/aarch64/movv2di_1.c: Likewise.
* gcc.target/aarch64/movv2x16qi_1.c: Likewise.
* gcc.target/aarch64/movv2x8qi_1.c: Likewise.
* gcc.target/aarch64/movv3x16qi_1.c: Likewise.
* gcc.target/aarch64/movv3x8qi_1.c: Likewise.
* gcc.target/aarch64/movv4x16qi_1.c: Likewise.
* gcc.target/aarch64/movv4x8qi_1.c: Likewise.
* gcc.target/aarch64/movv8qi_2.c: Likewise.
* gcc.target/aarch64/movv8qi_3.c: Likewise.
* gcc.target/aarch64/vect_unary_2.c: Likewise.
Diffstat (limited to 'gcc/tree-ssa.cc')
0 files changed, 0 insertions, 0 deletions