aboutsummaryrefslogtreecommitdiff
path: root/libjava/java
diff options
context:
space:
mode:
authorRoger Sayle <roger@nextmovesoftware.com>2023-06-26 09:36:02 +0100
committerRoger Sayle <roger@nextmovesoftware.com>2023-06-26 09:36:02 +0100
commit83269719640689415c0d5026ebfe05a0cf2bab72 (patch)
treea89aa63c5927c6c712f73541d438e1593f174067 /libjava/java
parent1bfe7e5352d1f4ac525317454aca45aa80a517ba (diff)
downloadgcc-83269719640689415c0d5026ebfe05a0cf2bab72.zip
gcc-83269719640689415c0d5026ebfe05a0cf2bab72.tar.gz
gcc-83269719640689415c0d5026ebfe05a0cf2bab72.tar.bz2
i386: New *ashl<dwi3>_doubleword_highpart define_insn_and_split.
This patch contains a pair of (related) optimizations in i386.md that allow us to generate better code for the example below (this is a step towards fixing a bugzilla PR, but I've forgotten the number). __int128 foo64(__int128 x, long long y) { __int128 t = (__int128)y << 64; return x ^ t; } The hidden issue is that the RTL currently seen by reload contains the sign extension of y from DImode to TImode, even though this is dead (not required) for left shifts by more than WORD_SIZE bits. (insn 11 8 12 2 (parallel [ (set (reg:TI 0 ax [orig:91 y ] [91]) (sign_extend:TI (reg:DI 1 dx [97]))) (clobber (reg:CC 17 flags)) (clobber (scratch:DI)) ]) {extendditi2} What makes this particularly undesirable is that the sign-extension pattern above requires an additional DImode scratch register, indicated by the clobber, which unnecessarily increases register pressure. The proposed solution is to add a define_insn_and_split for such left shifts (of sign or zero extensions) that only have a non-zero highpart, where the extension is redundant and eliminated, that can be split after reload, without scratch registers or early clobbers. This (late split) exposes a second optimization opportunity where setting the lowpart to zero can sometimes be combined/simplified with the following instruction during peephole2. For the test case above, we previously generated with -O2: foo64: xorl %eax, %eax xorq %rsi, %rdx xorq %rdi, %rax ret with this patch, we now generate: foo64: movq %rdi, %rax xorq %rsi, %rdx ret Likewise for the related -m32 test case, we go from: foo32: movl 12(%esp), %eax movl %eax, %edx xorl %eax, %eax xorl 8(%esp), %edx xorl 4(%esp), %eax ret to the improved: foo32: movl 12(%esp), %edx movl 4(%esp), %eax xorl 8(%esp), %edx ret 2023-06-26 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (peephole2): Simplify zeroing a register followed by an IOR, XOR or PLUS operation on it, into a move. (*ashl<dwi>3_doubleword_highpart): New define_insn_and_split to eliminate (and hide from reload) unnecessary word to doubleword extensions that are followed by left shifts by sufficiently large, but valid, bit counts. gcc/testsuite/ChangeLog * gcc.target/i386/ashldi3-1.c: New 32-bit test case. * gcc.target/i386/ashlti3-2.c: New 64-bit test case.
Diffstat (limited to 'libjava/java')
0 files changed, 0 insertions, 0 deletions