aboutsummaryrefslogtreecommitdiff
path: root/gcc/simplify-rtx.c
diff options
context:
space:
mode:
authorKyrylo Tkachov <kyrylo.tkachov@arm.com>2017-11-08 18:27:57 +0000
committerKyrylo Tkachov <ktkachov@gcc.gnu.org>2017-11-08 18:27:57 +0000
commit40757a25d45d47ddc50819bfd32dd6aac595abc2 (patch)
tree1fb9af62f06e3359a9e04ec7f5be8782fad1a802 /gcc/simplify-rtx.c
parent6432f025b4fccaaca8564e0c2518cdba869c4bf5 (diff)
downloadgcc-40757a25d45d47ddc50819bfd32dd6aac595abc2.zip
gcc-40757a25d45d47ddc50819bfd32dd6aac595abc2.tar.gz
gcc-40757a25d45d47ddc50819bfd32dd6aac595abc2.tar.bz2
vec_merge + vec_duplicate + vec_concat simplification
Another vec_merge simplification that's missing is transforming: (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N)) into (vec_concat x z) if N == 1 (0b01) or (vec_concat y x) if N == 2 (0b10) For the testcase in this patch on aarch64 this allows us to try matching during combine the pattern: (set (reg:V2DI 78 [ x ]) (vec_concat:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64]) (mem:DI (plus:DI (reg/v/f:DI 76 [ y ]) (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64]))) rather than the more complex: (set (reg:V2DI 78 [ x ]) (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76 [ y ]) (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64])) (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])) (const_int 2 [0x2]))) We don't actually have an aarch64 pattern for the simplified version above, but it's a simple enough form to add, so this patch adds such a pattern that performs a concatenated load of two 64-bit vectors in adjacent memory locations as a single Q-register LDR. The new aarch64 pattern is needed to demonstrate the effectiveness of the simplify-rtx change, so I've kept them together as one patch. Now for the testcase in the patch we can generate: construct_lanedi: ldr q0, [x0] ret construct_lanedf: ldr q0, [x0] ret instead of: construct_lanedi: ld1r {v0.2d}, [x0] ldr x0, [x0, 8] ins v0.d[1], x0 ret construct_lanedf: ld1r {v0.2d}, [x0] ldr d1, [x0, 8] ins v0.d[1], v1.d[0] ret The new memory constraint Utq is needed because we need to allow only the Q-register addressing modes but the MEM expressions in the RTL pattern have 64-bit vector modes, and if we don't constrain them they will allow the D-register addressing modes during register allocation/address mode selection, which will produce invalid assembly. Bootstrapped and tested on aarch64-none-linux-gnu. * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE): Simplify vec_merge of vec_duplicate and vec_concat. * config/aarch64/constraints.md (Utq): New constraint. * config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): New define_insn. * gcc.target/aarch64/load_v2vec_lanes_1.c: New test. From-SVN: r254549
Diffstat (limited to 'gcc/simplify-rtx.c')
-rw-r--r--gcc/simplify-rtx.c19
1 files changed, 19 insertions, 0 deletions
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 94302f6..92c783a 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5765,6 +5765,25 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
std::swap (newop0, newop1);
return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
}
+ /* Replace (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
+ with (vec_concat x z) if N == 1, or (vec_concat y x) if N == 2.
+ Only applies for vectors of two elements. */
+ if (GET_CODE (op0) == VEC_DUPLICATE
+ && GET_CODE (op1) == VEC_CONCAT
+ && GET_MODE_NUNITS (GET_MODE (op0)) == 2
+ && GET_MODE_NUNITS (GET_MODE (op1)) == 2
+ && IN_RANGE (sel, 1, 2))
+ {
+ rtx newop0 = XEXP (op0, 0);
+ rtx newop1 = XEXP (op1, 2 - sel);
+ rtx otherop = XEXP (op1, sel - 1);
+ if (sel == 2)
+ std::swap (newop0, newop1);
+ /* Don't want to throw away the other part of the vec_concat if
+ it has side-effects. */
+ if (!side_effects_p (otherop))
+ return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
+ }
}
if (rtx_equal_p (op0, op1)