diff options
author | Jay Foad <jay.foad@amd.com> | 2023-07-28 11:12:59 +0100 |
---|---|---|
committer | Jay Foad <jay.foad@amd.com> | 2023-08-07 15:41:40 +0100 |
commit | 56d92c17583e5f0b5e1e521b5f614be79436fccc (patch) | |
tree | a01b5cd27b41175516118c52e9708defd9fe9dda /llvm/lib/FileCheck | |
parent | 97324f6274184e607fa6d6cffb1aebee317d4644 (diff) | |
download | llvm-56d92c17583e5f0b5e1e521b5f614be79436fccc.zip llvm-56d92c17583e5f0b5e1e521b5f614be79436fccc.tar.gz llvm-56d92c17583e5f0b5e1e521b5f614be79436fccc.tar.bz2 |
[MachineScheduler] Track physical register dependencies per-regunit
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:
- The dependency tracking is more accurate and creates fewer useless
edges in the dependency graph. An AMDGPU example, edited for clarity:
SU(0): $vgpr1 = V_MOV_B32 $sgpr0
SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0
There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
SU(1) to SU(2). But the old dependency tracking code also added a
useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.
- On targets like AMDGPU that make heavy use of subregisters, each
register can have a huge number of aliases - it can be quadratic in
the size of the largest defined register tuple. There is a much lower
bound on the number of regunits per register, so iterating over
regunits is faster than iterating over aliases.
The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.
Recommit after fixing AggressiveAntiDepBreaker in D156880.
Differential Revision: https://reviews.llvm.org/D156552
Diffstat (limited to 'llvm/lib/FileCheck')
0 files changed, 0 insertions, 0 deletions