diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2021-10-26 08:33:41 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2021-10-26 08:33:41 +0100 |
commit | 6b8b25575570ffde37cc8997af096514b929779d (patch) | |
tree | ee8923431f1ed44c3f58d7cf988326d63b517109 /libiberty/xatexit.c | |
parent | 4e417eea8f3f14131f7370f9bd5dd568611d11df (diff) | |
download | gcc-6b8b25575570ffde37cc8997af096514b929779d.zip gcc-6b8b25575570ffde37cc8997af096514b929779d.tar.gz gcc-6b8b25575570ffde37cc8997af096514b929779d.tar.bz2 |
x86_64: Implement V1TI mode shifts/rotates by a constant
This patch provides RTL expanders to implement logical shifts and
rotates of 128-bit values (stored in vector integer registers) by
constant bit counts. Previously, GCC would transfer these values
to a pair of integer registers (TImode) via memory to perform the
operation, then transfer the result back via memory. Instead these
operations are now expanded using (between 1 and 5) SSE2 vector
instructions.
Logical shifts by multiples of 8 can be implemented using x86_64's
pslldq/psrldq instruction:
ashl_8: pslldq $1, %xmm0
ret
lshr_32:
psrldq $4, %xmm0
ret
Logical shifts by greater than 64 can use pslldq/psrldq $8, followed
by a psllq/psrlq for the remaining bits:
ashl_111:
pslldq $8, %xmm0
psllq $47, %xmm0
ret
lshr_127:
psrldq $8, %xmm0
psrlq $63, %xmm0
ret
The remaining logical shifts make use of the following idiom:
ashl_1:
movdqa %xmm0, %xmm1
psllq $1, %xmm0
pslldq $8, %xmm1
psrlq $63, %xmm1
por %xmm1, %xmm0
ret
lshr_15:
movdqa %xmm0, %xmm1
psrlq $15, %xmm0
psrldq $8, %xmm1
psllq $49, %xmm1
por %xmm1, %xmm0
ret
Rotates by multiples of 32 can use x86_64's pshufd:
rotr_32:
pshufd $57, %xmm0, %xmm0
ret
rotr_64:
pshufd $78, %xmm0, %xmm0
ret
rotr_96:
pshufd $147, %xmm0, %xmm0
ret
Rotates by multiples of 8 (other than multiples of 32) can make
use of both pslldq and psrldq, followed by por:
rotr_8:
movdqa %xmm0, %xmm1
psrldq $1, %xmm0
pslldq $15, %xmm1
por %xmm1, %xmm0
ret
rotr_112:
movdqa %xmm0, %xmm1
psrldq $14, %xmm0
pslldq $2, %xmm1
por %xmm1, %xmm0
ret
And the remaining rotates use one or two pshufd, followed by a
psrld/pslld/por sequence:
rotr_1:
movdqa %xmm0, %xmm1
pshufd $57, %xmm0, %xmm0
psrld $1, %xmm1
pslld $31, %xmm0
por %xmm1, %xmm0
ret
rotr_63:
pshufd $78, %xmm0, %xmm1
pshufd $57, %xmm0, %xmm0
pslld $1, %xmm1
psrld $31, %xmm0
por %xmm1, %xmm0
ret
rotr_111:
pshufd $147, %xmm0, %xmm1
pslld $17, %xmm0
psrld $15, %xmm1
por %xmm1, %xmm0
ret
The new test case, sse2-v1ti-shift.c, is a run-time check to confirm that
the results of V1TImode shifts/rotates by constants, exactly match the
expected results of TImode operations, for various input test vectors.
2021-10-26 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.c (ix86_expand_v1ti_shift): New helper
function to expand V1TI mode logical shifts by integer constants.
(ix86_expand_v1ti_rotate): New helper function to expand V1TI
mode rotations by integer constants.
* config/i386/i386-protos.h (ix86_expand_v1ti_shift,
ix86_expand_v1ti_rotate): Prototype new functions here.
* config/i386/sse.md (ashlv1ti3, lshrv1ti3, rotlv1ti3, rotrv1ti3):
New TARGET_SSE2 expanders to implement V1TI shifts and rotations.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-shift.c: New test case.
Diffstat (limited to 'libiberty/xatexit.c')
0 files changed, 0 insertions, 0 deletions