aboutsummaryrefslogtreecommitdiff
path: root/gcc/jit
diff options
context:
space:
mode:
authorTamar Christina <tamar.christina@arm.com>2024-07-05 12:10:39 +0100
committerTamar Christina <tamar.christina@arm.com>2024-07-05 12:10:39 +0100
commit97fcfeac3dcc433b792711fd840b92fa3e860733 (patch)
tree048d7dd2a3b8f4aa30282ce6e07ccdb3a1e26df0 /gcc/jit
parent6ff698106644af39da9e0eda51974fdcd111280d (diff)
downloadgcc-97fcfeac3dcc433b792711fd840b92fa3e860733.zip
gcc-97fcfeac3dcc433b792711fd840b92fa3e860733.tar.gz
gcc-97fcfeac3dcc433b792711fd840b92fa3e860733.tar.bz2
AArch64: lower 2 reg TBL permutes with one zero register to 1 reg TBL.
When a two reg TBL is performed with one operand being a zero vector we can instead use a single reg TBL and map the indices for accessing the zero vector to an out of range constant. On AArch64 out of range indices into a TBL have a defined semantics of setting the element to zero. Many uArches have a slower 2-reg TBL than 1-reg TBL. Before this change we had: typedef unsigned int v4si __attribute__ ((vector_size (16))); v4si f1 (v4si a) { v4si zeros = {0,0,0,0}; return __builtin_shufflevector (a, zeros, 0, 5, 1, 6); } which generates: f1: mov v30.16b, v0.16b movi v31.4s, 0 adrp x0, .LC0 ldr q0, [x0, #:lo12:.LC0] tbl v0.16b, {v30.16b - v31.16b}, v0.16b ret .LC0: .byte 0 .byte 1 .byte 2 .byte 3 .byte 20 .byte 21 .byte 22 .byte 23 .byte 4 .byte 5 .byte 6 .byte 7 .byte 24 .byte 25 .byte 26 .byte 27 and with the patch: f1: adrp x0, .LC0 ldr q31, [x0, #:lo12:.LC0] tbl v0.16b, {v0.16b}, v31.16b ret .LC0: .byte 0 .byte 1 .byte 2 .byte 3 .byte -1 .byte -1 .byte -1 .byte -1 .byte 4 .byte 5 .byte 6 .byte 7 .byte -1 .byte -1 .byte -1 .byte -1 This sequence is generated often by openmp and aside from the strict performance impact of this change, it also gives better register allocation as we no longer have the consecutive register limitation. gcc/ChangeLog: * config/aarch64/aarch64.cc (struct expand_vec_perm_d): Add zero_op0_p and zero_op_p1. (aarch64_evpc_tbl): Implement register value remapping. (aarch64_vectorize_vec_perm_const): Detect if operand is a zero dup before it's forced to a reg. gcc/testsuite/ChangeLog: * gcc.target/aarch64/tbl_with_zero_1.c: New test. * gcc.target/aarch64/tbl_with_zero_2.c: New test.
Diffstat (limited to 'gcc/jit')
0 files changed, 0 insertions, 0 deletions