Age | Commit message (Collapse) | Author | Files | Lines |
|
.. to isReMaterializableImpl. The "Really" naming has always been
awkward, and we're working towards removing the "Trivial" part now,
so go ehead and remove both pieces in a single rename.
Note that this doesn't change any aspect of the current
implementation; we still "mostly" only return instructions which
are trivial (meaning no virtual register uses), but some targets
do lie about that today.
|
|
This is a generalization of the LookupPtrRegClass mechanism.
AMDGPU has several use cases for swapping the register class of
instruction operands based on the subtarget, but none of them
really fit into the box of being pointer-like.
The current system requires manual management of an arbitrary integer
ID. For the AMDGPU use case, this would end up being around 40 new
entries to manage.
This just introduces the base infrastructure. I have ports of all
the target specific usage of PointerLikeRegClass ready.
|
|
This is a low level utility to parse the MCInstrInfo and should
not depend on the state of the function.
|
|
getPointerRegClass is a layering violation. Its primary purpose
is to determine how to interpret an MCInstrDesc's operands RegClass
fields. This should be context free, and only depend on the subtarget.
The model of this is also wrong, since this should be an
instruction / operand specific property, not a global pointer class.
Remove the the function argument to help stage removal of this hook
and avoid introducing any new obstacles to replacing it.
The remaining uses of the function were to get the subtarget, which
TargetRegisterInfo already belongs to. A few targets needed new
subtarget derived properties copied there.
|
|
canCombine (#157210)
We already have the register number from the user operand. Use it
instead of assuming it must be operand 0 of the producing instruction.
Fixes #157118
|
|
This fixes a latent miscompile. To understand why the flag can't be
preserved, consider the case where a0=0, a1=0, a2=-1, and s3=-1.
|
|
getOpcode() already returns unsigned.
|
|
This reduces the amount of boilerplate required when adding a new
field to MIMetadata and reduces the chance of bugs like the
one I fixed in TargetInstrInfo::reassociateOps.
Reviewers: arsenm, nikic
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/133535
|
|
RegallocBase::cleanupFailedVReg hacks up the state of the liveness in
order to facilitate producing valid IR. During this process, we may end
up producing undef copies.
If the destination of these copies is a spill candidate, we will attempt
to fold the source register when issuing the spill. The undef of the
source is not propagated to storeRegToStackSlot , thus we end up
dropping the undef, issuing a spill, and producing an illegal liveness
state.
This checks for undef copies, and, if found, inserts a kill instead of
spill.
|
|
It seems `subsituteRegister` checks `FromReg == ToReg` instead of
`TRI->isSubRegisterEq`.
This PR simply reverts the original PR
(https://github.com/llvm/llvm-project/pull/131361) to its initial
implementation (without using `subsituteRegister`).
Not sure whether it is a desired fix (and by no means that I am an
expert on LLVM backend), but it does fix a numeric error on our internal
workload.
Original author: @sdesmalen-arm
|
|
When the RegisterCoalescer adds an implicit-def when coalescing
a SUBREG_TO_REG (#123632), this causes issues when removing other
COPY nodes by commuting the instruction because it doesn't take
the implicit-def into consideration. This PR fixes that.
|
|
(#140002)
Add per-property has<Prop>/set<Prop>/reset<Prop> functions to
MachineFunctionProperties.
|
|
|
|
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.
We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
|
|
This just moves the x86 implementation into generic code since it appears
to be suitable for any target. The heart of this transform is inside
foldMemoryOperand so other targets won't actually kick in until they
implement said API. This just removes one piece to implement in the
process of enabling foldMemoryOperand.
|
|
instructions into a tree (#132728)
This pass is designed to increase ILP by performing accumulation into
multiple registers. It currently supports only the S/UABAL accumulation
instruction, but can be extended to support additional instructions.
Reland of #126060 which was reverted due to a conflict with #131272.
|
|
instructions into a tree to increase ILP (#126060) (#132607)
This reverts commit c4caf949aa934a219e84d4ba0530bd535e698cdb.
|
|
instructions into a tree to increase ILP (#126060)
This pattern shows up often in media libraries. The optimization should only
kick in for O3. Currently only supports a single family of accumulation
instructions, but can easily be expanded to support additional
instructions in the future.
|
|
interface. NFC (#131272)
|
|
Fixes the "use after poison" issue introduced by #121516 (see
<https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>).
The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.
The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
|
|
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
|
|
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.
This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
|
|
Avoid getting this from the MachineFunction
|
|
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
|
|
This opens up a door for reusing reassociation optimizations on
target-specific binary operations with non-standard operand list.
This is effectively a NFC.
|
|
D33412/D33413 introduced this to support a clang pragma to set section
names for a symbol depending on if it would be placed in
bss/data/rodata/text, which may not be known until the backend. However,
for text we know that only functions will go there, so just directly set
the section in clang instead of going through a completely separate
attribute.
Autoupgrade the "implicit-section-name" attribute to directly setting
the section on a Fuction.
|
|
We split target-dependent MachineCombiner patterns into their target
folder.
This makes MachineCombiner much more target-independent.
Reviewers:
davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc
Reviewed By: topperc, mshockwave
Pull Request: https://github.com/llvm/llvm-project/pull/87991
|
|
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
|
|
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
|
|
Follow up on a post-commit review of 9468de4 (TargetInstrInfo: make
getOperandLatency return optional (NFC)) by Bjorn Pettersson to fix a
couple of things that are not NFC:
- std::optional<T>::operator<= returns true if the first operand is a
std::nullopt and second operand is T. Fix a couple of places where we
assumed it would return false.
- In TargetSchedule, computeInstrCost could take another codepath,
returning InstrLatency instead of DefaultDefLatency. Fix one instance
not accounting for this behavior.
|
|
In commit b05335989239 ("[X86InstrInfo] support memfold on spillable
inline asm
(#70832)"), I had a last minute fix to update the memoperands. I
originally
did this in the parent foldInlineAsmMemOperand call, updated the mir
test via
update_mir_test_checks.py, but then decided to move it to the child call
of
foldInlineAsmMemOperand.
But I forgot to rerun update_mir_test_checks.py. That last minute change
caused
the same memoperand to be added twice when recursion occurred (for tied
operands). I happened to get lucky that trailing content omitted from
the
CHECK line doesn't result in test failure.
But rerunning update_mir_test_checks.py on the mir test added in that
commit
produces updated output. This is resulting in updates to the test that:
1. conflate additions to the test in child commits with simply updating
the
test as it should have been when first committed.
2. look wrong because the same memoperand is specified twice (we don't
deduplicate memoperands when added). Example:
INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0)
Fix the bug, so that in child commits, we don't have additional
unrelated test
changes (which would be wrong anyways) from simply running
update_mir_test_checks.py.
Link: #20571
|
|
|
|
getOperandLatency has the following behavior: it returns -1 as a special
value, negative numbers other than -1 on some target-specific overrides,
or a valid non-negative latency. This behavior can be surprising, as
some callers do arithmetic on these negative values. Change the
interface of getOperandLatency to return a std::optional<unsigned> to
prevent surprises in callers. While at it, change the interface of
getInstrLatency to return unsigned instead of int.
This change was inspired by a refactoring in
TargetSchedModel::computeOperandLatency.
|
|
This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.
Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.
Link: https://github.com/llvm/llvm-project/issues/20571
|
|
reassociation as X86/PowerPC/RISCV. (#72820)
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.
Fixes #72777.
|
|
(#72910)
This reverts commit 42204c94ba9fcb0b4b1335e648ce140a3eef8a9d.
It was accidentally backed out.
#20571
#70743
|
|
This reverts commit 99ee2db198d86f685bcb07a1495a7115ffc31d7e.
It's causing ICEs in the ARM tests. See the comment here:
https://github.com/llvm/llvm-project/commit/99ee2db198d86f685bcb07a1495a7115ffc31d7e
|
|
foldMemoryOperand looks at pairs of instructions (generally a load to
virt reg then use of the virtreg, or def of a virtreg then a store) and
attempts to combine them. This can reduce register pressure.
A prior commit added the ability to mark such a MachineOperand as
foldable. In terms of INLINEASM, this means that "rm" was used (rather
than just "r") to denote that the INLINEASM may use a memory operand
rather than a register operand. This effectively undoes decisions made
by the instruction selection framework. Callers will be added in the
register allocation frameworks. This has been tested with all of the
above (which will come as follow up patches).
Thanks to @topperc who suggested this at last years LLVM US Dev Meeting
and @qcolombet who confirmed this was the right approach.
Link: https://github.com/llvm/llvm-project/issues/20571
|
|
When using the inline asm constraint string "rm" (or "g"), we generally
would like the compiler to choose "r", but it is permitted to choose "m"
if there's register pressure. This is distinct from "r" in which the
register is not permitted to be spilled to the stack.
The decision of which to use must be made at some point. Currently, the
instruction selection frameworks (ISELs) make the choice, and the
register allocators had better be able to handle the result.
Steal a bit from Storage when using register operands to disambiguate
between the two cases. Add helpers/getters/setters, and print in MIR
when such a register is foldable.
The getter will later be used by the register allocation frameworks (and
asserted by the ISELs) while the setters will be used by the instruction
selection frameworks.
Link: https://github.com/llvm/llvm-project/issues/20571
|
|
https://github.com/llvm/llvm-project/commit/28b912687900bc0a67cd61c374fce296b09963c4
introduced the path cloning format in the basic-block-sections profile.
This PR validates and applies path clonings.
A path cloning is valid if all of these conditions hold:
1. All bb ids in the path are mapped to existing blocks.
2. Each two consecutive bb ids in the path have a successor relationship
in the CFG.
3. The path does not include a block with indirect branches, except
possibly as the last block.
Applying a path cloning involves cloning all blocks in the path (except
the first one) and setting up their branches.
Once all clonings are applied, the cluster information is used to guide
block layout in the modified function.
|
|
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
|
|
This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd.
Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"
This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1.
There were SystemZ and Mips build errors, too many to fix forward.
|
|
Similar to
commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC")
Fix the TODOs added in
commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
|
|
I would like to steal one of these bits to denote whether a kind may be
spilled by the register allocator or not, but I'm afraid to touch of any
this code using bitwise operands.
Make flags a first class type using bitfields, rather than launder data
around via `unsigned`.
|
|
Should add some minor type safety to the use of this information, since
there's quite a bit of metadata being laundered through an `unsigned`.
I'm looking to potentially add more bitfields to that `unsigned`, but I
find InlineAsm's big ol' bag of enum values and usage of `unsigned`
confusing, type-unsafe, and un-ergonomic. These can probably be better
abstracted.
I think the lack of static_cast outside of InlineAsm indicates the prior
code smell fixed here.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D159242
|
|
Because unconditional branch relaxation on AArch64 grows the stack to
spill a register, splitting a function would cause the red zone to be
overwritten. Explicitly disable MFS for such functions.
Differential Revision: https://reviews.llvm.org/D157127
|
|
|
|
Currently `isTriviallyReMaterializable` calls
`isReallyTriviallyReMaterializable` and
`isReallyTriviallyReMaterializableGeneric`. The two interfaces
are confusing, but there are also some real issues with this.
The documentation of this function (see below) suggests that
`isReallyTriviallyRematerializable` allows the target to override the
default behaviour.
/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
/// set, this hook lets the target specify whether the instruction is actually
/// trivially rematerializable, taking into consideration its operands.
It however implements something different. The default behaviour
is the analysis done in `isReallyTriviallyReMaterializableGeneric`,
which is testing if it is safe to rematerialize the MachineInstr.
The result of `isReallyTriviallyReMaterializable` is only considered if
`isReallyTriviallyReMaterializableGeneric` returns `false`. That means
there is no way to override the default behaviour if
`isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to
rematerialize, but we'd rather not).
By making this a single interface, we can override the interface to do either.
Reviewed By: craig.topper, nemanjai
Differential Revision: https://reviews.llvm.org/D156520
|
|
live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
|
|
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D156113
|