aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-ssa.cc
diff options
context:
space:
mode:
authorRichard Sandiford <richard.sandiford@arm.com>2022-09-13 09:28:49 +0100
committerRichard Sandiford <richard.sandiford@arm.com>2022-09-13 09:28:49 +0100
commit721c0fb3aca31d3bf8ad6e929eab32e29a427e60 (patch)
treeb925ba17f10063caf197befa3365547ea7f69dec /gcc/tree-ssa.cc
parent91061fd5ace2b8ee6bf31bf5f5cbfdf55a25d5e1 (diff)
downloadgcc-721c0fb3aca31d3bf8ad6e929eab32e29a427e60.zip
gcc-721c0fb3aca31d3bf8ad6e929eab32e29a427e60.tar.gz
gcc-721c0fb3aca31d3bf8ad6e929eab32e29a427e60.tar.bz2
aarch64: Vector move fixes for +nosimd
This patch fixes various issues around the handling of vectors and (particularly) vector structures with +nosimd. Previously, passing and returning structures would trigger an ICE, since: * we didn't allow the structure modes to be stored in FPRs * we didn't provide +nosimd move patterns * splitting the moves into word-sized pieces (the default strategy without move patterns) doesn't work because the registers are doubleword sized. The patch is a bit of a hodge-podge since a lot of the handling of moves, register costs, and register legitimacy is so interconnected. It didn't seem feasible to split things further. Some notes: * The patch recognises vector and tuple modes based on TARGET_FLOAT rather than TARGET_SIMD, and instead adds TARGET_SIMD to places that really do need the vector ISA. This is necessary for the modes to be handled correctly in register arguments and returns. * The 64-bit (DREG) STP peephole required TARGET_SIMD but the LDP peephole didn't. I think the LDP one is right, since DREG moves could involve GPRs as well as FPRs. * The patch keeps the existing choices of instructions for TARGET_SIMD, just in case they happen to be better than FMOV on some uarches. * Before the patch, +nosimd Q<->Q moves of 128-bit scalars went via a GPR, thanks to a secondary reload pattern. This approach might not be ideal, but there's no reason that 128-bit vectors should behave differently from 128-bit scalars. The patch therefore extends the current scalar approach to vectors. * Multi-vector LD1 and ST1 require TARGET_SIMD, so the TARGET_FLOAT structure moves need to use LDP/STP and LDR/STR combinations instead. That's also what we do for big-endian even with TARGET_SIMD, so most of the code was already there. The patterns for structures of 64-bit vectors are identical, but the patterns for structures of 128-bit vectors need to cope with the lack of 128-bit Q<->Q moves. It isn't feasible to move multi-vector tuples via GPRs, so the patch moves them via memory instead. This contaminates the port with its first secondary memory reload. gcc/ * config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Use TARGET_FLOAT instead of TARGET_SIMD. (aarch64_vectorize_related_mode): Restrict ADVSIMD handling to TARGET_SIMD. (aarch64_hard_regno_mode_ok): Don't allow tuples of 2 64-bit vectors in GPRs. (aarch64_classify_address): Treat little-endian structure moves like big-endian for TARGET_FLOAT && !TARGET_SIMD. (aarch64_secondary_memory_needed): New function. (aarch64_secondary_reload): Handle 128-bit Advanced SIMD vectors in the same way as TF, TI and TD. (aarch64_rtx_mult_cost): Restrict ADVSIMD handling to TARGET_SIMD. (aarch64_rtx_costs): Likewise. (aarch64_register_move_cost): Treat a pair of 64-bit vectors separately from a single 128-bit vector. Handle the cost implied by aarch64_secondary_memory_needed. (aarch64_simd_valid_immediate): Restrict ADVSIMD handling to TARGET_SIMD. (aarch64_expand_vec_perm_const_1): Likewise. (TARGET_SECONDARY_MEMORY_NEEDED): New macro. * config/aarch64/iterators.md (VTX): New iterator. * config/aarch64/aarch64.md (arches): Add fp_q as a synonym of simd. (arch_enabled): Adjust accordingly. (@aarch64_reload_mov<TX:mode>): Extend to... (@aarch64_reload_mov<VTX:mode>): ...this. * config/aarch64/aarch64-simd.md (mov<mode>): Require TARGET_FLOAT rather than TARGET_SIMD. (movmisalign<mode>): Likewise. (load_pair<DREG:mode><DREG2:mode>): Likewise. (vec_store_pair<DREG:mode><DREG2:mode>): Likewise. (load_pair<VQ:mode><VQ2:mode>): Likewise. (vec_store_pair<VQ:mode><VQ2:mode>): Likewise. (@aarch64_split_simd_mov<mode>): Likewise. (aarch64_get_low<mode>): Likewise. (aarch64_get_high<mode>): Likewise. (aarch64_get_half<mode>): Likewise. Canonicalize to a move for lowpart extracts. (*aarch64_simd_mov<VDMOV:mode>): Require TARGET_FLOAT rather than TARGET_SIMD. Use different w<-w and r<-w instructions for !TARGET_SIMD. Disable immediate moves for !TARGET_SIMD but add an alternative specifically for w<-Z. (*aarch64_simd_mov<VQMOV:mode>): Require TARGET_FLOAT rather than TARGET_SIMD. Likewise for the associated define_splits. Disable FPR moves and immediate moves for !TARGET_SIMD but add an alternative specifically for w<-Z. (aarch64_simd_mov_from_<mode>high): Require TARGET_FLOAT rather than TARGET_SIMD. Restrict the existing alternatives to TARGET_SIMD but add a new r<-w one for !TARGET_SIMD. (*aarch64_get_high<mode>): New pattern. (load_pair_lanes<mode>): Require TARGET_FLOAT rather than TARGET_SIMD. (store_pair_lanes<mode>): Likewise. (*aarch64_combine_internal<mode>): Likewise. Restrict existing w<-w, w<-r and w<-m alternatives to TARGET_SIMD but add a new w<-r alternative for !TARGET_SIMD. (*aarch64_combine_internal_be<mode>): Likewise. (aarch64_combinez<mode>): Require TARGET_FLOAT rather than TARGET_SIMD. Remove bogus arch attribute. (*aarch64_combinez_be<mode>): Likewise. (@aarch64_vec_concat<mode>): Require TARGET_FLOAT rather than TARGET_SIMD. (aarch64_combine<mode>): Likewise. (aarch64_rev_reglist<mode>): Likewise. (mov<mode>): Likewise. (*aarch64_be_mov<VSTRUCT_2D:mode>): Extend to TARGET_FLOAT && !TARGET_SIMD, regardless of endianness. Extend associated define_splits in the same way, both for this pattern and the ones below. (*aarch64_be_mov<VSTRUCT_2Qmode>): Likewise. Restrict w<-w alternative to TARGET_SIMD. (*aarch64_be_movoi): Likewise. (*aarch64_be_movci): Likewise. (*aarch64_be_movxi): Likewise. (*aarch64_be_mov<VSTRUCT_4QD:mode>): Extend to TARGET_FLOAT && !TARGET_SIMD, regardless of endianness. Restrict w<-w alternative to TARGET_SIMD for tuples of 128-bit vectors. (*aarch64_be_mov<VSTRUCT_4QD:mode>): Likewise. * config/aarch64/aarch64-ldpstp.md: Remove TARGET_SIMD condition from DREG STP peephole. Change TARGET_SIMD to TARGET_FLOAT in the VQ and VP_2E LDP and STP peepholes. gcc/testsuite/ * gcc.target/aarch64/ldp_stp_20.c: New test. * gcc.target/aarch64/ldp_stp_21.c: Likewise. * gcc.target/aarch64/ldp_stp_22.c: Likewise. * gcc.target/aarch64/ldp_stp_23.c: Likewise. * gcc.target/aarch64/ldp_stp_24.c: Likewise. * gcc.target/aarch64/movv16qi_1.c (gpr_to_gpr): New function. * gcc.target/aarch64/movv8qi_1.c (gpr_to_gpr): Likewise. * gcc.target/aarch64/movv16qi_2.c: New test. * gcc.target/aarch64/movv16qi_3.c: Likewise. * gcc.target/aarch64/movv2di_1.c: Likewise. * gcc.target/aarch64/movv2x16qi_1.c: Likewise. * gcc.target/aarch64/movv2x8qi_1.c: Likewise. * gcc.target/aarch64/movv3x16qi_1.c: Likewise. * gcc.target/aarch64/movv3x8qi_1.c: Likewise. * gcc.target/aarch64/movv4x16qi_1.c: Likewise. * gcc.target/aarch64/movv4x8qi_1.c: Likewise. * gcc.target/aarch64/movv8qi_2.c: Likewise. * gcc.target/aarch64/movv8qi_3.c: Likewise. * gcc.target/aarch64/vect_unary_2.c: Likewise.
Diffstat (limited to 'gcc/tree-ssa.cc')
0 files changed, 0 insertions, 0 deletions