diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2022-06-25 09:35:45 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2022-06-25 09:35:45 +0100 |
commit | defa8537afc734faefde07c9ebdb38252133fbb1 (patch) | |
tree | 1c4fa22a20fefde10307b3cbea8843578d2f0a6e /gcc | |
parent | 476ef855d08db02a027150ea92611c1626ea7350 (diff) | |
download | gcc-defa8537afc734faefde07c9ebdb38252133fbb1.zip gcc-defa8537afc734faefde07c9ebdb38252133fbb1.tar.gz gcc-defa8537afc734faefde07c9ebdb38252133fbb1.tar.bz2 |
Iterating cprop_hardreg... Third time's a charm.
This middle-end patch proposes the "hard register constant propagation"
pass be performed up to three times on each basic block (up from the
current two times) if the second pass successfully made changes.
The motivation for three passes is to handle the "swap idiom" (i.e.
t = x; x = y; y = t;" sequences) that get generated by register allocation
(reload).
Consider the x86_64 test case for __int128 addition recently discussed
on gcc-patches. With that proposed patch, the input to the cprop_hardreg
pass looks like:
movq %rdi, %r8
movq %rsi, %rdi
movq %r8, %rsi
movq %rdx, %rax
movq %rcx, %rdx
addq %rsi %rax
adcq %rdi, %rdx
ret
where the first three instructions effectively swap %rsi and %rdi.
On the first pass of cprop_hardreg, we notice that the third insn,
%rsi := %r8, is redundant and can eliminated/propagated to produce:
movq %rdi, %r8
movq %rsi, %rdi
movq %rdx, %rax
movq %rcx, %rdx
addq %r8 %rax
adcq %rdi, %rdx
ret
Because a successful propagation was found, cprop_hardreg then runs
a second pass/sweep on affected basic blocks (using worklist), and
on this second pass notices that the second instruction, %rdi := %rsi,
may now be propagated (%rsi was killed in the before the first transform),
and after a second pass, we now end up with:
movq %rdi, %r8
movq %rdx, %rax
movq %rcx, %rdx
addq %r8, %rax
adcq %rsi, %rdx
ret
which is the current behaviour on mainline. However, a third and final
pass would now notice that the first insn, "%r8 := %rdi" is also now
eliminable, and a third iteration would produce optimal code:
movq %rdx, %rax
movq %rcx, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
The patch below creates two worklists, and alternates between them on
sucessive passes, populating NEXT with the basic block id's of blocks
that were updated during the current pass over the CURR worklist.
It should be noted that this a regression fix; GCC 4.8 generated
optimal code with two moves (whereas GCC 12 required 5 moves, up
from GCC 11's 4 moves).
2022-06-25 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* regcprop.cc (pass_cprop_hardreg::execute): Perform a third
iteration over each basic block that was updated by the second
iteration.
Diffstat (limited to 'gcc')
-rw-r--r-- | gcc/regcprop.cc | 23 |
1 files changed, 16 insertions, 7 deletions
diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc index 1fdc367..eacc59f 100644 --- a/gcc/regcprop.cc +++ b/gcc/regcprop.cc @@ -1383,7 +1383,9 @@ pass_cprop_hardreg::execute (function *fun) auto_sbitmap visited (last_basic_block_for_fn (fun)); bitmap_clear (visited); - auto_vec<int> worklist; + auto_vec<int> worklist1, worklist2; + auto_vec<int> *curr = &worklist1; + auto_vec<int> *next = &worklist2; bool any_debug_changes = false; /* We need accurate notes. Earlier passes such as if-conversion may @@ -1404,7 +1406,7 @@ pass_cprop_hardreg::execute (function *fun) FOR_EACH_BB_FN (bb, fun) { if (cprop_hardreg_bb (bb, all_vd, visited)) - worklist.safe_push (bb->index); + curr->safe_push (bb->index); if (all_vd[bb->index].n_debug_insn_changes) any_debug_changes = true; } @@ -1416,16 +1418,22 @@ pass_cprop_hardreg::execute (function *fun) if (MAY_HAVE_DEBUG_BIND_INSNS && any_debug_changes) cprop_hardreg_debug (fun, all_vd); - /* Second pass if we've changed anything, only for the bbs where we have - changed anything though. */ - if (!worklist.is_empty ()) + /* Repeat pass up to PASSES times, but only processing basic blocks + that have changed on the previous iteration. CURR points to the + current worklist, and each iteration populates the NEXT worklist, + swapping pointers after each cycle. */ + + unsigned int passes = optimize > 1 ? 3 : 2; + for (unsigned int pass = 2; pass <= passes && !curr->is_empty (); pass++) { any_debug_changes = false; bitmap_clear (visited); - for (int index : worklist) + next->truncate (0); + for (int index : *curr) { bb = BASIC_BLOCK_FOR_FN (fun, index); - cprop_hardreg_bb (bb, all_vd, visited); + if (cprop_hardreg_bb (bb, all_vd, visited)) + next->safe_push (bb->index); if (all_vd[bb->index].n_debug_insn_changes) any_debug_changes = true; } @@ -1433,6 +1441,7 @@ pass_cprop_hardreg::execute (function *fun) df_analyze (); if (MAY_HAVE_DEBUG_BIND_INSNS && any_debug_changes) cprop_hardreg_debug (fun, all_vd); + std::swap (curr, next); } free (all_vd); |