riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kyrylo Tkachov <ktkachov@nvidia.com>	2024-10-09 09:40:33 -0700
committer	Kyrylo Tkachov <ktkachov@nvidia.com>	2024-10-11 17:23:19 +0200
commit	1dcc6a1a67165a469d4cd9b6b39514c46cc656ad (patch)
tree	53c9c71e537edcd2d78d7dac2d336f5da26dfdd3 /gcc/df-problems.cc
parent	70566e719f0710323251e8e9190b322f4de8faeb (diff)
download	gcc-1dcc6a1a67165a469d4cd9b6b39514c46cc656ad.zip gcc-1dcc6a1a67165a469d4cd9b6b39514c46cc656ad.tar.gz gcc-1dcc6a1a67165a469d4cd9b6b39514c46cc656ad.tar.bz2

PR target/117048 aarch64: Use more canonical and optimization-friendly representation for XAR instruction

The pattern for the Advanced SIMD XAR instruction isn't very optimization-friendly at the moment. In the testcase from the PR once simlify-rtx has done its work it generates the RTL: (set (reg:V2DI 119 [ _14 ]) (rotate:V2DI (xor:V2DI (reg:V2DI 114 [ vect__1.12_16 ]) (reg:V2DI 116 [ *m1_01_8(D) ])) (const_vector:V2DI [ (const_int 32 [0x20]) repeated x2 ]))) which fails to match our XAR pattern because the pattern expects: 1) A ROTATERT instead of the ROTATE. However, according to the RTL ops documentation the preferred form of rotate-by-immediate is ROTATE, which I take to mean it's the canonical form. ROTATE (x, C) <-> ROTATERT (x, MODE_WIDTH - C) so it's better to match just one canonical representation. 2) A CONST_INT shift amount whereas the midend asks for a repeated vector constant. These issues are fixed by introducing a dedicated expander for the aarch64_xarqv2di name, needed by the arm_neon.h intrinsic, that translate the intrinsic-level CONST_INT immediate (the right-rotate amount) into a repeated vector constant subtracted from 64 to give the corresponding left-rotate amount that is fed to the new representation for the XAR define_insn that uses the ROTATE RTL code. This is a similar approach to have we handle the discrepancy between intrinsic-level and RTL-level vector lane numbers for big-endian. With this patch and [1/2] the arithmetic parts of the testcase now simplify to just one XAR instruction. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ PR target/117048 * config/aarch64/aarch64-simd.md (aarch64_xarqv2di): Redefine into a define_expand. (*aarch64_xarqv2di_insn): Define. gcc/testsuite/ PR target/117048 * g++.target/aarch64/pr117048.C: New test.

Diffstat (limited to 'gcc/df-problems.cc')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: