Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds the MIR parsing and serialization support for save and
restore points with subsets of callee saved registers. That is, it
syntactically allows a function to contain two or more distinct
sub-regions in which distinct subsets of registers are spilled/filled as
callee save. This is useful if e.g. one of the CSRs isn't modified in
one of the sub-regions, but is in the other(s).
Support for actually using this capability in code generation is still
forthcoming. This patch is the next logical step for multiple
save/restore points support.
All points are now stored in DenseMap from MBB to vector of
CalleeSavedInfo.
Shrink-Wrap points split Part 4.
RFC:
https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581
Part 1: https://github.com/llvm/llvm-project/pull/117862 (landed)
Part 2: https://github.com/llvm/llvm-project/pull/119355 (landed)
Part 3: https://github.com/llvm/llvm-project/pull/119357 (landed)
Part 5: https://github.com/llvm/llvm-project/pull/119359 (likely to be
further split)
|
|
Currently mir supports only one save and one restore point
specification:
```
savePoint: '%bb.1'
restorePoint: '%bb.2'
```
This patch provide possibility to have multiple save and multiple
restore points in mir:
```
savePoints:
- point: '%bb.1'
restorePoints:
- point: '%bb.2'
```
Shrink-Wrap points split Part 3.
RFC:
https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581
Part 1: https://github.com/llvm/llvm-project/pull/117862
Part 2: https://github.com/llvm/llvm-project/pull/119355
Part 4: https://github.com/llvm/llvm-project/pull/119358
Part 5: https://github.com/llvm/llvm-project/pull/119359
|
|
for huge frame offsets" (#152239)
The first commit is identical to
69bec0afbb8f2aa0021d18ea38768360b16583a9.
The second commit fixes the instruction verification failures by
replacing the erroneous instruction with a trap after the error is
reported and adds `-verify-machineinstrs` to the tests added in the
original PR to catch the issue sooner.
After that change, all tests pass with both
`LLVM_ENABLE_EXPENSIVE_CHECKS={On,Off}`.
cc @RKSimon @e-kud @phoebewang @arsenm as reviewers on the original PR
|
|
huge frame offsets" (#151975)
Reverts llvm/llvm-project#123872 - this is breaking on EXPENSIVE_CHECKS builds
Co-authored-by: Abhishek Kaushik <abhishek.kaushik@intel.com>
|
|
frame offsets (#123872)
The assertion previously did not work correctly because the operand was
being truncated to an `int` prior to comparison.
Change the assertion into a a reported error as suggested in
https://github.com/llvm/llvm-project/pull/101840#issuecomment-2304992425
by @arsenm
Finally, fix the lowering on 64-bit targets so that offsets larger than
32-bit are correctly addressed and add tests for various reported
issues.
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
(#140002)
Add per-property has<Prop>/set<Prop>/reset<Prop> functions to
MachineFunctionProperties.
|
|
|
|
Add support for using the existing SCRATCH_STORE_BLOCK and
SCRATCH_LOAD_BLOCK instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, block-vgpr-csr. It
does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.
Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.
In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in SIFrameLowering. We also add new pseudos,
SI_BLOCK_SPILL_V1024_SAVE/RESTORE.
One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).
This was reverted due to failures in the builds with expensive checks
on, now fixed by always updating LiveIntervals and SlotIndexes in
SILowerSGPRSpills.
|
|
Reverts llvm/llvm-project#130013 due to failures with expensive checks
on.
|
|
Add support for using the existing `SCRATCH_STORE_BLOCK` and
`SCRATCH_LOAD_BLOCK` instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, `block-vgpr-csr`.
It does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.
Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.
In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in `SIFrameLowering`. We also add new pseudos,
`SI_BLOCK_SPILL_V1024_SAVE/RESTORE`.
One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).
|
|
|
|
(#128095)
Callee saved registers should always be phyiscal registers. They are
often passed directly to other functions that take MCRegister like
getMinimalPhysRegClass or TargetRegisterClass::contains.
Unfortunately, sometimes the MCRegister is compared to a Register which
gave an ambiguous comparison error when the MCRegister is on the LHS.
Adding a MCRegister==Register comparison operator created more ambiguous
comparison errors elsewhere. These cases were usually comparing against
a base or frame pointer register that is a physical register in a
Register. For those I added an explicit conversion of Register to
MCRegister to fix the error.
|
|
|
|
using(). NFC
Many of these are indexing BitVectors or something where we can't
using MCRegister and need the register number.
|
|
Identified with misc-include-cleaner.
|
|
(#101840)"
This casuses assertion failures targeting 32-bit x86:
lib/Target/X86/X86RegisterInfo.cpp:989:
virtual bool llvm::X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator, int, unsigned int, RegScavenger *) const:
Assertion `(Is64Bit || FitsIn32Bits) && "Requesting 64-bit offset in 32-bit immediate!"' failed.
See comment on the PR.
> Fix 32-bit integer overflows in the X86 target frame layout when dealing
> with frames larger than 4gb. When this occurs, we'll scavenge a scratch
> register to be able to hold the correct stack offset for frame locals.
>
> This completes reapplying #84114.
>
> Fixes #48911
> Fixes #75944
> Fixes #87154
This reverts commit 0abb7791614947bc24931dd851ade31d02496977.
|
|
Fix 32-bit integer overflows in the X86 target frame layout when dealing
with frames larger than 4gb. When this occurs, we'll scavenge a scratch
register to be able to hold the correct stack offset for frame locals.
This completes reapplying #84114.
Fixes #48911
Fixes #75944
Fixes #87154
|
|
This patch fixes https://github.com/llvm/llvm-project/issues/17204.
If a base pointer is used in a function, and it is clobbered by an
instruction (typically an inline asm), current register allocator can't
handle this situation, so BP becomes garbage after those instructions.
It can also occur to FP in theory.
We can spill and reload FP/BP registers around those instructions. But
normal spill/reload instructions also use FP/BP, so we can't spill them
into normal spill slots, instead we spill them into the top of stack by
using SP register.
|
|
Emit an optimization remark when objects in the stack frame may cause
hazards in a streaming mode function. The analysis requires either the
`aarch64-stack-hazard-size` or `aarch64-stack-hazard-remark-size` flag
to be set by the user, with the former flag taking precedence.
|
|
than 2gb (#99263)
Rebase of #84114. I've only included the core changes to frame layout
calculation & CFI generation which sidesteps the regressions found after
merging #84114. Since these changes are a necessary precursor to the
overall fix and are themselves slightly beneficial as CFI is now
generated correctly, I think it is reasonable to merge this first step.
---
For very large stack frames, the offset from the stack pointer to a
local can be more than 2^31 which overflows various `int` offsets in the
frame lowering code.
This patch updates the frame lowering code to calculate the offsets as
64-bit values and fixes CFI to use the corrected sizes.
After this patch, additional work is needed to fix offset truncations in
each target's codegen.
|
|
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
|
|
result (#94571)
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
|
|
|
|
frames larger than 2gb (#84114)"
This is failing on some EXPENSIVE_CHECKS buildbots
|
|
For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code.
This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames.
Fixes #48911
|
|
This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92.
Reverting due to verifier complaints with expensive checks on build-bot.
|
|
Have the verifier report a missing AdjustsStack flag rather than waiting until
PEI asserts.
|
|
llvm-project/llvm/lib/CodeGen/PrologEpilogInserter.cpp:369:12:
error: unused variable 'MaxCFSIn' [-Werror,-Wunused-variable]
uint32_t MaxCFSIn =
^
1 error generated.
|
|
- Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code.
- Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().
|
|
Track the live register state immediately before, instead of after,
MBBI. This makes it simple to track the state at the start or end of a
basic block without a separate (and poorly named) Tracking flag.
This changes the API of the backward(MachineBasicBlock::iterator I)
method, which now recedes to the state just before, instead of just
after, *I. Some clients are simplified by this change.
There is one small functional change shown in the lit tests where
multiple spilled registers all need to be reloaded before the same
instruction. The reloads will now be inserted in the opposite order.
This should not affect correctness.
|
|
It is already present in the yaml, but missing from the printed
diagnostics.
|
|
(#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
|
|
PPC64 allows stack size up to ((2^63)-1) bytes. Currently llc reports
```
warning: stack frame size (4294967568) exceeds limit (4294967295) in function 'main'
```
if the stack allocated is larger than 4G.
|
|
elimination
All targets that require register scavenging now use backwards frame
index elimination.
Differential Revision: https://reviews.llvm.org/D156986
|
|
Also rename the flag from supportsBackwardScavenger to
eliminateFrameIndicesBackwards to reflect what it actually does.
X86 is the only target still using forwards frame index elimination.
This will not block removing support for forwards register scavenging,
because X86 does not use the register scavenger.
Differential Revision: https://reviews.llvm.org/D156983
|
|
This adds support for reprocessing new instructions that were generated
by the target's eliminateFrameIndex.
Backwards frame index elimination uses backwards register scavenging,
which is preferred because it does not rely on accurate kill flags.
Differential Revision: https://reviews.llvm.org/D156690
|
|
A tail call may have $noreg operands.
Fixes a crash.
Reviewed By: xgupta
Differential Revision: https://reviews.llvm.org/D156485
|
|
This adds better support for call frame pseudos that adjust SP in
PEI::replaceFrameIndicesBackward.
Running frame index elimination backwards is preferred because it can
do backwards register scavenging (on targets that require scavenging)
which does not rely on accurate kill flags.
Differential Revision: https://reviews.llvm.org/D156434
|
|
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D156113
|
|
This reverts commit 58d1eaa3b6ce4f7285c51f83faff7a3ac374c746.
|
|
D154281
|
|
Record the SP adjustment on entry to each basic block. This is almost
always zero except on targets like ARM which can split a basic block in
the middle of a call sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D154281
|
|
This adds support for running PEI::replaceFrameIndicesBackward with no
RegisterScavenger, and basic support for eliminating call frame pseudo
instructions.
Differential Revision: https://reviews.llvm.org/D154347
|
|
Differential Revision: https://reviews.llvm.org/D154346
|
|
A year ago when I was not invested at all into compilers, I found an assertion error when building an AArch64 debug build with LTO + CFI, among other combinations.
It was posted as a github issue here: https://github.com/llvm/llvm-project/issues/54088
I took it upon myself to revisit the issue now that I have spent some more time working on LLVM.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D151276
|
|
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D152098
|
|
builds
|
|
While investigating a bug in Clang, I noticed that -Wframe-larger-than
was emitting extra debug information along with the diagnostic. It
turns out that 2e1e2f52f357768186ecfcc5ac53d5fa53d1b094 fixed an issue
with the diagnostic, but accidentally left in some debug code that was
exposed in all builds.
So now we no longer emit things like:
8/4294967304 (0.00%) spills, 4294967296/4294967304 (100.00%) variables
along with the diagnostic
|
|
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D148692
|