diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2021-11-02 21:58:32 +0000 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2021-11-02 21:58:32 +0000 |
commit | 2a83259f837e5cbd39467a3faf954b51d9d13664 (patch) | |
tree | 2831f9357c817abdde9cb326acfd2a1a0c0f3499 /gcc/gcov.c | |
parent | 18f0873d1e595dc2e5db738550e6e2b0e2953d84 (diff) | |
download | gcc-2a83259f837e5cbd39467a3faf954b51d9d13664.zip gcc-2a83259f837e5cbd39467a3faf954b51d9d13664.tar.gz gcc-2a83259f837e5cbd39467a3faf954b51d9d13664.tar.bz2 |
x86_64: Improved implementation of TImode rotations.
This simple patch improves the implementation of 128-bit (TImode)
rotations on x86_64 (a missed optimization opportunity spotted
during the recent V1TImode improvements).
Currently, the function:
unsigned __int128 rotrti3(unsigned __int128 x, unsigned int i) {
return (x >> i) | (x << (128-i));
}
produces:
rotrti3:
movq %rsi, %r8
movq %rdi, %r9
movl %edx, %ecx
movq %rdi, %rsi
movq %r9, %rax
movq %r8, %rdx
movq %r8, %rdi
shrdq %r8, %rax
shrq %cl, %rdx
xorl %r8d, %r8d
testb $64, %cl
cmovne %rdx, %rax
cmovne %r8, %rdx
negl %ecx
andl $127, %ecx
shldq %r9, %rdi
salq %cl, %rsi
xorl %r9d, %r9d
testb $64, %cl
cmovne %rsi, %rdi
cmovne %r9, %rsi
orq %rdi, %rdx
orq %rsi, %rax
ret
with this patch, GCC will now generate the much nicer:
rotrti3:
movl %edx, %ecx
movq %rdi, %rdx
shrdq %rsi, %rdx
shrdq %rdi, %rsi
andl $64, %ecx
movq %rdx, %rax
cmove %rsi, %rdx
cmovne %rsi, %rax
ret
Even I wasn't expecting the optimizer's choice of the final three
instructions; a thing of beauty. For rotations larger than 64,
the lowpart and the highpart (%rax and %rdx) are transposed, and
it would be nice to have a conditional swap/exchange. The inspired
solution the compiler comes up with is to store/duplicate the same
value in both %rax/%rdx, and then use complementary conditional moves
to either update the lowpart or highpart, which cleverly avoids the
potential decode-stage pipeline stall (on some microarchitectures)
from having multiple instructions conditional on the same condition.
See X86_TUNE_ONE_IF_CONV_INSN, and notice there are two such stalls
in the original expansion of rot[rl]ti3.
2021-11-02 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
* config/i386/i386.md (<any_rotate>ti3): Provide expansion for
rotations by non-constant amounts.
Diffstat (limited to 'gcc/gcov.c')
0 files changed, 0 insertions, 0 deletions