diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2022-08-15 17:46:38 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2022-08-15 17:46:38 +0100 |
commit | f8cada540d85ac9d53b10f2e9265cb51f6f72514 (patch) | |
tree | 30cf846ce5784a49f6388860967ce5d92b61fd46 /libjava/javax/accessibility | |
parent | 6f94923dea21bd92ba2fc40c4a3be509bb1b7f0c (diff) | |
download | gcc-f8cada540d85ac9d53b10f2e9265cb51f6f72514.zip gcc-f8cada540d85ac9d53b10f2e9265cb51f6f72514.tar.gz gcc-f8cada540d85ac9d53b10f2e9265cb51f6f72514.tar.bz2 |
Support shifts and rotates by integer constants in TImode STV on x86_64.
This patch adds support for converting 128-bit TImode shifts and rotates
to SSE equivalents using V1TImode during the TImode STV pass.
Previously, only logical shifts by multiples of 8 were handled
(from my patch earlier this month).
As an example of the benefits, the following rotate by 32-bits:
unsigned __int128 a, b;
void rot32() { a = (b >> 32) | (b << 96); }
when compiled on x86_64 with -O2 previously generated:
movq b(%rip), %rax
movq b+8(%rip), %rdx
movq %rax, %rcx
shrdq $32, %rdx, %rax
shrdq $32, %rcx, %rdx
movq %rax, a(%rip)
movq %rdx, a+8(%rip)
ret
with this patch, now generates:
movdqa b(%rip), %xmm0
pshufd $57, %xmm0, %xmm0
movaps %xmm0, a(%rip)
ret
[which uses a V4SI permutation for those that don't read SSE].
This should help 128-bit cryptography codes, that interleave XORs
with rotations (but that don't use additions or subtractions).
2022-08-15 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc
(timode_scalar_chain::compute_convert_gain): Provide costs for
shifts and rotates.
(timode_scalar_chain::convert_insn): Handle ASHIFTRT, ROTATERT
and ROTATE just like existing ASHIFT and LSHIFTRT cases.
(timode_scalar_to_vector_candidate_p): Handle all shifts and
rotates by integer constants between 0 and 127.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-stv-9.c: New test case.
Diffstat (limited to 'libjava/javax/accessibility')
0 files changed, 0 insertions, 0 deletions