diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2023-06-26 09:36:02 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2023-06-26 09:36:02 +0100 |
commit | 83269719640689415c0d5026ebfe05a0cf2bab72 (patch) | |
tree | a89aa63c5927c6c712f73541d438e1593f174067 /libjava/java | |
parent | 1bfe7e5352d1f4ac525317454aca45aa80a517ba (diff) | |
download | gcc-83269719640689415c0d5026ebfe05a0cf2bab72.zip gcc-83269719640689415c0d5026ebfe05a0cf2bab72.tar.gz gcc-83269719640689415c0d5026ebfe05a0cf2bab72.tar.bz2 |
i386: New *ashl<dwi3>_doubleword_highpart define_insn_and_split.
This patch contains a pair of (related) optimizations in i386.md that
allow us to generate better code for the example below (this is a step
towards fixing a bugzilla PR, but I've forgotten the number).
__int128 foo64(__int128 x, long long y)
{
__int128 t = (__int128)y << 64;
return x ^ t;
}
The hidden issue is that the RTL currently seen by reload contains
the sign extension of y from DImode to TImode, even though this is
dead (not required) for left shifts by more than WORD_SIZE bits.
(insn 11 8 12 2 (parallel [
(set (reg:TI 0 ax [orig:91 y ] [91])
(sign_extend:TI (reg:DI 1 dx [97])))
(clobber (reg:CC 17 flags))
(clobber (scratch:DI))
]) {extendditi2}
What makes this particularly undesirable is that the sign-extension
pattern above requires an additional DImode scratch register, indicated
by the clobber, which unnecessarily increases register pressure.
The proposed solution is to add a define_insn_and_split for such
left shifts (of sign or zero extensions) that only have a non-zero
highpart, where the extension is redundant and eliminated, that can
be split after reload, without scratch registers or early clobbers.
This (late split) exposes a second optimization opportunity where
setting the lowpart to zero can sometimes be combined/simplified with
the following instruction during peephole2.
For the test case above, we previously generated with -O2:
foo64: xorl %eax, %eax
xorq %rsi, %rdx
xorq %rdi, %rax
ret
with this patch, we now generate:
foo64: movq %rdi, %rax
xorq %rsi, %rdx
ret
Likewise for the related -m32 test case, we go from:
foo32: movl 12(%esp), %eax
movl %eax, %edx
xorl %eax, %eax
xorl 8(%esp), %edx
xorl 4(%esp), %eax
ret
to the improved:
foo32: movl 12(%esp), %edx
movl 4(%esp), %eax
xorl 8(%esp), %edx
ret
2023-06-26 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (peephole2): Simplify zeroing a register
followed by an IOR, XOR or PLUS operation on it, into a move.
(*ashl<dwi>3_doubleword_highpart): New define_insn_and_split to
eliminate (and hide from reload) unnecessary word to doubleword
extensions that are followed by left shifts by sufficiently large,
but valid, bit counts.
gcc/testsuite/ChangeLog
* gcc.target/i386/ashldi3-1.c: New 32-bit test case.
* gcc.target/i386/ashlti3-2.c: New 64-bit test case.
Diffstat (limited to 'libjava/java')
0 files changed, 0 insertions, 0 deletions