aboutsummaryrefslogtreecommitdiff
path: root/gcc/splay-tree-utils.tcc
diff options
context:
space:
mode:
authorKyrylo Tkachov <ktkachov@nvidia.com>2024-08-05 11:29:44 -0700
committerKyrylo Tkachov <ktkachov@nvidia.com>2024-08-12 11:41:04 +0200
commitfcc766c82cf8e0473ba54f1660c8282a7ce3231c (patch)
tree47efffe04c8e7d64e763367d485c190e4956c95f /gcc/splay-tree-utils.tcc
parent8d8db21eb726b785782f4a41ad85a0d4be63068a (diff)
downloadgcc-fcc766c82cf8e0473ba54f1660c8282a7ce3231c.zip
gcc-fcc766c82cf8e0473ba54f1660c8282a7ce3231c.tar.gz
gcc-fcc766c82cf8e0473ba54f1660c8282a7ce3231c.tar.bz2
aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for Advanced SIMD
On many cores, including Neoverse V2 the throughput of vector ADD instructions is higher than vector shifts like SHL. We can lean on that to emit code like: add v0.4s, v0.4s, v0.4s instead of: shl v0.4s, v0.4s, 1 LLVM already does this trick. In RTL the code gets canonincalised from (plus x x) to (ashift x 1) so I opted to instead do this at the final assembly printing stage, similar to how we emit CMLT instead of SSHR elsewhere in the backend. I'd like to also do this for SVE shifts, but those will have to be separate patches. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_imm_shl<mode><vczle><vczbe>): Rewrite to new syntax. Add =w,w,vs1 alternative. * config/aarch64/constraints.md (vs1): New constraint. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd_shl_add.c: New test.
Diffstat (limited to 'gcc/splay-tree-utils.tcc')
0 files changed, 0 insertions, 0 deletions