diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2021-11-02 08:23:04 +0000 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2021-11-02 08:26:14 +0000 |
commit | 1188cf5fb7d9c3f0753cdb11d961fe90113991e8 (patch) | |
tree | 21e30cc901caabdce18eaed9dbd9da52928d640a /gcc/fold-const.c | |
parent | e72f783c86264dc4a8f0addd340a9950da1d1b6d (diff) | |
download | gcc-1188cf5fb7d9c3f0753cdb11d961fe90113991e8.zip gcc-1188cf5fb7d9c3f0753cdb11d961fe90113991e8.tar.gz gcc-1188cf5fb7d9c3f0753cdb11d961fe90113991e8.tar.bz2 |
x86_64: Expand ashrv1ti (and PR target/102986)
This patch was originally intended to implement 128-bit arithmetic right
shifts by constants of vector registers (V1TImode), but while working on
it I discovered the (my) recent ICE on valid regression now known as
PR target/102986.
As diagnosed by Jakub, expanders for shifts are not allowed to fail, and
so any backend that provides a shift optab needs to handle variable amount
shifts as well as constant shifts [even though the middle-end knows how
to synthesize these for vector modes]. This constraint could be relaxed
in the middle-end, but it makes sense to fix this in my erroneous code.
The solution is to change the constraints on the recently added (and new)
shift expanders from SImode const_int_register to QImode general operand,
matching the TImode expanders' constraints, and then simply check for
!CONST_INT_P at the start of the ix86_expand_v1ti_* functions, converting
the operands from V1TImode to TImode, performing the TImode operation
and converting the result back to V1TImode.
One nice benefit of this strategy, is that it allows us to implement
Uros' recent suggestion, that we should be more efficiently converting
between these modes, avoiding the use of memory and using the same idiom
as LLVM or using pextrq/pinsrq where available. The new helper functions
ix86_expand_v1ti_to_ti and ix86_expand_ti_to_v1ti are sufficient to take
care of this. Interestingly partial support for this is already present,
but x86_64's generic tuning prefers memory transfers to avoid penalizing
microarchitectures with significant interunit delays. With these changes
we now generate both pextrq and pinsrq for -mtune=native.
The main body of the patch is to implement arithmetic right shift in
addition to the logical right shift and left shift implemented in the
previous patch. This expander provides no less than 13 different code
sequences, special casing the different constant shifts, including
variants taking advantage of TARGET_AVX2 and TARGET_SSE4_1. The code
is structured with the faster/shorter sequences and the start, and
the generic implementations at the end.
For the record, the implementations are:
ashr_127: // Shift 127, 2 operations, 10 bytes
pshufd $255, %xmm0, %xmm0
psrad $31, %xmm0
ret
ashr_64: // Shift by 64, 3 operations, 14 bytes
pshufd $255, %xmm0, %xmm1
psrad $31, %xmm1
punpckhqdq %xmm1, %xmm0
ret
ashr_96: // Shift by 96, 3 operations, 18 bytes
movdqa %xmm0, %xmm1
psrad $31, %xmm1
punpckhqdq %xmm1, %xmm0
pshufd $253, %xmm0, %xmm0
ret
ashr_8: // Shift by 8/16/24/32 on AVX2, 3 operations, 16 bytes
vpsrad $8, %xmm0, %xmm1
vpsrldq $1, %xmm0, %xmm0
vpblendd $7, %xmm0, %xmm1, %xmm0
ret
ashr_8: // Shift by 8/16/24/32 on SSE4.1, 3 operations, 24 bytes
movdqa %xmm0, %xmm1
psrldq $1, %xmm0
psrad $8, %xmm1
pblendw $63, %xmm0, %xmm1
movdqa %xmm1, %xmm0
ret
ashr_97: // Shifts by 97..126, 4 operations, 23 bytes
movdqa %xmm0, %xmm1
psrad $31, %xmm0
psrad $1, %xmm1
punpckhqdq %xmm0, %xmm1
pshufd $253, %xmm1, %xmm0
ret
ashr_48: // Shifts by 48/80 on SSE4.1, 4 operations, 25 bytes
movdqa %xmm0, %xmm1
pshufd $255, %xmm0, %xmm0
psrldq $6, %xmm1
psrad $31, %xmm0
pblendw $31, %xmm1, %xmm0
ret
ashr_8: // Shifts by multiples of 8, 5 operations, 28 bytes
movdqa %xmm0, %xmm1
pshufd $255, %xmm0, %xmm0
psrad $31, %xmm0
psrldq $1, %xmm1
pslldq $15, %xmm0
por %xmm1, %xmm0
ret
ashr_1: // Shifts by 1..31 on AVX2, 6 operations, 30 bytes
vpsrldq $8, %xmm0, %xmm2
vpsrad $1, %xmm0, %xmm1
vpsllq $63, %xmm2, %xmm2
vpsrlq $1, %xmm0, %xmm0
vpor %xmm2, %xmm0, %xmm0
vpblendd $7, %xmm0, %xmm1, %xmm0
ret
ashr_1: // Shifts by 1..15 on SSE4.1, 6 operations, 42 bytes
movdqa %xmm0, %xmm2
movdqa %xmm0, %xmm1
psrldq $8, %xmm2
psrlq $1, %xmm0
psllq $63, %xmm2
psrad $1, %xmm1
por %xmm2, %xmm0
pblendw $63, %xmm0, %xmm1
movdqa %xmm1, %xmm0
ret
ashr_1: // Shift by 1, 8 operations, 46 bytes
movdqa %xmm0, %xmm1
movdqa %xmm0, %xmm2
psrldq $8, %xmm2
psrlq $63, %xmm1
psllq $63, %xmm2
psrlq $1, %xmm0
pshufd $191, %xmm1, %xmm1
por %xmm2, %xmm0
psllq $31, %xmm1
por %xmm1, %xmm0
ret
ashr_65: // Shifts by 65..95, 8 operations, 42 bytes
pshufd $255, %xmm0, %xmm1
psrldq $8, %xmm0
psrad $31, %xmm1
psrlq $1, %xmm0
movdqa %xmm1, %xmm2
psllq $63, %xmm1
pslldq $8, %xmm2
por %xmm2, %xmm1
por %xmm1, %xmm0
ret
ashr_2: // Shifts from 2..63, 9 operations, 47 bytes
pshufd $255, %xmm0, %xmm1
movdqa %xmm0, %xmm2
psrad $31, %xmm1
psrldq $8, %xmm2
psllq $62, %xmm2
psrlq $2, %xmm0
pslldq $8, %xmm1
por %xmm2, %xmm0
psllq $62, %xmm1
por %xmm1, %xmm0
ret
To test these changes there are several new test cases. sse2-v1ti-shift-2.c
is a compile-test designed to spot/catch PR target/102986 [for all shifts
and rotates by variable amounts], and sse2-v1ti-shift-3.c is an execution
test to confirm shifts/rotates by variable amounts produce the same results
for TImode and V1TImode. sse2-v1ti-ashiftrt-1.c is a (similar) execution
test to confirm arithmetic right shifts by different constants produce
identical results between TImode and V1TImode. sse2-v1ti-ashift-[23].c are
duplicates of this file as compilation tests specifying -mavx2 and -msse4.1
respectively to trigger all the paths through the new expander.
2021-11-02 Roger Sayle <roger@nextmovesoftware.com>
Jakub Jelinek <jakub@redhat.com>
gcc/ChangeLog
PR target/102986
* config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
ix86_expand_ti_to_v1ti): New helper functions.
(ix86_expand_v1ti_shift): Check if the amount operand is an
integer constant, and expand as a TImode shift if it isn't.
(ix86_expand_v1ti_rotate): Check if the amount operand is an
integer constant, and expand as a TImode rotate if it isn't.
(ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
right shifts of V1TImode quantities.
* config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
* config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
to QImode general_operand, and let the helper functions lower
shifts by non-constant operands, as TImode shifts. Make
conditional on TARGET_64BIT.
(ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
(rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
Make conditional on TARGET_64BIT.
gcc/testsuite/ChangeLog
PR target/102986
* gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
* gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
* gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
* gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
* gcc.target/i386/sse2-v1ti-shift-3.c: New test case.
Diffstat (limited to 'gcc/fold-const.c')
0 files changed, 0 insertions, 0 deletions