Age | Commit message (Collapse) | Author | Files | Lines |
|
The analysis will be needed by both the greedy register allocator and the
X86FloatingPoint pass. It only needs to be computed once when the CFG doesn't
change.
This pass is very fast, usually showing up as 0.0% wall time.
llvm-svn: 122832
|
|
makes getLeader() nonrecursive.
llvm-svn: 122811
|
|
spent in StrongPHIElimination on 403.gcc.
llvm-svn: 122803
|
|
static constructors.
llvm-svn: 122795
|
|
a 28% speedup of MachineCSE time on 403.gcc.
llvm-svn: 122735
|
|
so that Dominators.h is *just* domtree. Also prune #includes a bit.
llvm-svn: 122714
|
|
This allows us to compile:
void test(char *s, int a) {
__builtin_memset(s, a, 15);
}
into 1 mul + 3 stores instead of 3 muls + 3 stores.
llvm-svn: 122710
|
|
series of shifts and ors.
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.
Example code:
void test(char *s, int a) {
__builtin_memset(s, a, 4);
}
before:
_test: ## @test
movzbl 8(%esp), %eax
movl %eax, %ecx
shll $8, %ecx
orl %eax, %ecx
movl %ecx, %eax
shll $16, %eax
orl %ecx, %eax
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
after:
_test: ## @test
movzbl 8(%esp), %eax
imull $16843009, %eax, %eax ## imm = 0x1010101
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
llvm-svn: 122707
|
|
with 2-address instructions, for about a 3.5% speedup of StrongPHIElimination on
403.gcc.
llvm-svn: 122635
|
|
llvm-svn: 122628
|
|
process those instructions that define phi sources. This is a 47% speedup of
StrongPHIElimination compile time on 403.gcc.
llvm-svn: 122627
|
|
llvm-svn: 122625
|
|
llvm-svn: 122617
|
|
in the most obvious way.
llvm-svn: 122610
|
|
it relies on assumptions that may not be true in the future.
llvm-svn: 122608
|
|
we are only interested in the defs when discovering interferences.
This is a 28% speedup running StrongPHIElimination on 403.gcc.
llvm-svn: 122596
|
|
in this function, but the compiler was warning that it might be when
doing a release build.
llvm-svn: 122595
|
|
llvm-svn: 122586
|
|
when running without the verifier, and I have not yet checked them to see if
the new results are still correct. There are more verifier failures, but they
all seem to be additional occurrences of verifier failures that occur with the
existing PHIElimination pass. There are a few obvious issues with the code:
1) It doesn't properly update the register equivalence classes during copy
insertion, and instead recomputes them before merging live intervals and
renaming registers. I wanted to keep this first patch simple for debugging
purposes, but it shouldn't be very hard to do this.
2) It doesn't mix the renaming and live interval merging with the copy insertion
process, which leads to a lot of virtual register churn. Virtual registers and
live intervals are created, only to later be merged into others. The code should
be smarter and only create a new virtual register if there is no existing
register in the same congruence class.
3) In one place the code uses a DenseMap per basic block, which is unnecessary
heap allocation. There should be an inline storage version of DenseMap.
I did a quick compile-time test of running llc on 403.gcc with and without
StrongPHIElimination. It is slightly slower with StrongPHIElimination, because
the small decrease in the coalescer runtime can't beat the increase in phi
elimination runtime. Perhaps fixing the above performance issues will narrow
the gap.
I also haven't yet run any tests of the quality of the generated code.
llvm-svn: 122582
|
|
valno verification. The "Different value live out of predecessor" check is
incorrect in the case of phi-def valnos, so just skip that check for phi-def
valnos and instead check that all of the valnos for predecessors have phi-kill.
Fixes PR8863.
llvm-svn: 122581
|
|
llvm-svn: 122545
|
|
scheduling node may have a NULL DAG node, yuck.
llvm-svn: 122544
|
|
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
llvm-svn: 122541
|
|
llvm-svn: 122539
|
|
llvm-svn: 122537
|
|
llvm-svn: 122509
|
|
llvm-svn: 122507
|
|
and instruction issue.
llvm-svn: 122491
|
|
multiple nodes per cycle.
llvm-svn: 122474
|
|
llvm-svn: 122473
|
|
In the bottom-up selection DAG scheduling, handle two-address
instructions that read/write unspillable registers. Treat
the entire chain of two-address nodes as a single live range.
llvm-svn: 122472
|
|
new gcc warning that complains on self-assignments and
self-initializations.
llvm-svn: 122458
|
|
illegal. The latter usually compiles into smaller code.
example code:
unsigned foo(unsigned x, unsigned y) {
if (x != 0) y--;
return y;
}
before:
_foo: ## @foo
cmpl $1, 4(%esp) ## encoding: [0x83,0x7c,0x24,0x04,0x01]
sbbl %eax, %eax ## encoding: [0x19,0xc0]
notl %eax ## encoding: [0xf7,0xd0]
addl 8(%esp), %eax ## encoding: [0x03,0x44,0x24,0x08]
ret ## encoding: [0xc3]
after:
_foo: ## @foo
cmpl $1, 4(%esp) ## encoding: [0x83,0x7c,0x24,0x04,0x01]
movl 8(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x08]
adcl $-1, %eax ## encoding: [0x83,0xd0,0xff]
ret ## encoding: [0xc3]
llvm-svn: 122455
|
|
pick the victim with the lowest total spill weight.
llvm-svn: 122445
|
|
llvm-svn: 122444
|
|
loads properly. We miscompiled the testcase into:
_test: ## @test
movl $128, (%rdi)
movzbl 1(%rdi), %eax
ret
Now we get a proper:
_test: ## @test
movl $128, (%rdi)
movsbl (%rdi), %eax
movzbl %ah, %eax
ret
This fixes PR8757.
llvm-svn: 122392
|
|
unhanded cases faster and simplify code.
llvm-svn: 122391
|
|
llvm-svn: 122389
|
|
the same physical register. Simplifies the fix from the previous
checkin r122211.
llvm-svn: 122370
|
|
llvm-svn: 122368
|
|
the shift type was needed one place, the shift count
type another. The transform in 123555 had the same
problem.
llvm-svn: 122366
|
|
llvm-svn: 122360
|
|
llvm-svn: 122355
|
|
count operand. These should be the same but apparently are
not always, and this is cleaner anyway. This improves the
code in an existing test.
llvm-svn: 122354
|
|
llvm-svn: 122353
|
|
llvm-svn: 122349
|
|
llvm-svn: 122345
|
|
llvm-svn: 122342
|
|
of the problems with my last attempt were in the updating of LiveIntervals
rather than the coalescing itself. Therefore, I decided to get that right first
by essentially reimplementing the existing PHIElimination using LiveIntervals.
It works correctly, with only a few tests failing (which may not be legitimate
failures) and no new verifier failures (at least as far as I can tell, I didn't
count the number per file).
llvm-svn: 122321
|
|
something that just glues two nodes together, even if it is
sometimes used for flags.
llvm-svn: 122310
|