aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/TargetInstrInfo.cpp
AgeCommit message (Collapse)AuthorFilesLines
6 days[CodeGen] Rename isReallyTriviallyReMaterializable [nfc]Philip Reames1-1/+1
.. to isReMaterializableImpl. The "Really" naming has always been awkward, and we're working towards removing the "Trivial" part now, so go ehead and remove both pieces in a single rename. Note that this doesn't change any aspect of the current implementation; we still "mostly" only return instructions which are trivial (meaning no virtual register uses), but some targets do lie about that today.
10 daysCodeGen: Add RegisterClass by HwMode (#158269)Matt Arsenault1-2/+5
This is a generalization of the LookupPtrRegClass mechanism. AMDGPU has several use cases for swapping the register class of instruction operands based on the subtarget, but none of them really fit into the box of being pointer-like. The current system requires manual management of an arbitrary integer ID. For the AMDGPU use case, this would end up being around 40 new entries to manage. This just introduces the base infrastructure. I have ports of all the target specific usage of PointerLikeRegClass ready.
2025-09-12CodeGen: Remove MachineFunction argument from getRegClass (#158188)Matt Arsenault1-3/+2
This is a low level utility to parse the MCInstrInfo and should not depend on the state of the function.
2025-09-12CodeGen: Remove MachineFunction argument from getPointerRegClass (#158185)Matt Arsenault1-1/+1
getPointerRegClass is a layering violation. Its primary purpose is to determine how to interpret an MCInstrDesc's operands RegClass fields. This should be context free, and only depend on the subtarget. The model of this is also wrong, since this should be an instruction / operand specific property, not a global pointer class. Remove the the function argument to help stage removal of this hook and avoid introducing any new obstacles to replacing it. The remaining uses of the function were to get the subtarget, which TargetRegisterInfo already belongs to. A few targets needed new subtarget derived properties copied there.
2025-09-05[TargetInstrInfo][AArch64] Don't assume register came from operand 0 in ↵Craig Topper1-1/+1
canCombine (#157210) We already have the register number from the user operand. Use it instead of assuming it must be operand 0 of the producing instruction. Fixes #157118
2025-08-31[CodeGen] Drop disjoint flag when reassociating (#156218)Philip Reames1-0/+2
This fixes a latent miscompile. To understand why the flag can't be preserved, consider the case where a0=0, a1=0, a2=-1, and s3=-1.
2025-08-04[CodeGen] Remove an unnecessary cast (NFC) (#151901)Kazu Hirata1-1/+1
getOpcode() already returns unsigned.
2025-07-31MachineInstrBuilder: Introduce copyMIMetadata() function.Peter Collingbourne1-1/+1
This reduces the amount of boilerplate required when adding a new field to MIMetadata and reduces the chance of bugs like the one I fixed in TargetInstrInfo::reassociateOps. Reviewers: arsenm, nikic Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/133535
2025-07-17[TII] Do not fold undef copies (#147392)Jeffrey Byrnes1-5/+11
RegallocBase::cleanupFailedVReg hacks up the state of the liveness in order to facilitate producing valid IR. During this process, we may end up producing undef copies. If the destination of these copies is a spill candidate, we will attempt to fold the source register when issuing the spill. The undef of the source is not propagated to storeRegToStackSlot , thus we end up dropping the undef, issuing a spill, and producing an illegal liveness state. This checks for undef copies, and, if found, inserts a kill instead of spill.
2025-07-11[CodeGen] Do not use subsituteRegister to update implicit def (#148068)Peiming Liu1-8/+21
It seems `subsituteRegister` checks `FromReg == ToReg` instead of `TRI->isSubRegisterEq`. This PR simply reverts the original PR (https://github.com/llvm/llvm-project/pull/131361) to its initial implementation (without using `subsituteRegister`). Not sure whether it is a desired fix (and by no means that I am an expert on LLVM backend), but it does fix a numeric error on our internal workload. Original author: @sdesmalen-arm
2025-07-10[CodeGen] commuteInstruction should update implicit-def (#131361)Sander de Smalen1-1/+8
When the RegisterCoalescer adds an implicit-def when coalescing a SUBREG_TO_REG (#123632), this causes issues when removing other COPY nodes by commuting the instruction because it doesn't take the implicit-def into consideration. This PR fixes that.
2025-05-22[LLVM][CodeGen] Add convenience accessors for MachineFunctionProperties ↵users/pcc/spr/main.elf-add-branch-to-branch-optimizationRahul Joshi1-2/+1
(#140002) Add per-property has<Prop>/set<Prop>/reset<Prop> functions to MachineFunctionProperties.
2025-04-27[llvm] Use range constructors of *Set (NFC) (#137552)Kazu Hirata1-1/+1
2025-04-18[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342)Philip Reames1-1/+1
This change removes the uint64_t constructor on LocationSize preventing implicit conversion, and fixes up the using APIs to adapt to the change. Note that I'm adding a couple of explicit conversion points on routines where passing in a fixed offset as an integer seems likely to have well understood semantics. We had an unfortunate case which arose if you tried to pass a TypeSize value to a parameter of LocationSize type. We'd find the implicit conversion path through TypeSize -> uint64_t -> LocationSize which works just fine for fixed values, but looses information and fails assertions if the TypeSize was scalable. This change breaks the first link in that implicit conversion chain since that seemed to be the easier one.
2025-03-26[CodeGen] Provide a target independent default for optimizeLoadInst [NFC]Philip Reames1-0/+41
This just moves the x86 implementation into generic code since it appears to be suitable for any target. The heart of this transform is inside foldMemoryOperand so other targets won't actually kick in until they implement said API. This just removes one piece to implement in the process of enabling foldMemoryOperand.
2025-03-25[Machine-Combiner] Add a pass to reassociate chains of accumulation ↵Jonathan Cohen1-11/+259
instructions into a tree (#132728) This pass is designed to increase ILP by performing accumulation into multiple registers. It currently supports only the S/UABAL accumulation instruction, but can be extended to support additional instructions. Reland of #126060 which was reverted due to a conflict with #131272.
2025-03-23Revert "[AArch64][MachineCombiner] Recombine long chains of accumulation ↵Jonathan Cohen1-262/+11
instructions into a tree to increase ILP (#126060) (#132607) This reverts commit c4caf949aa934a219e84d4ba0530bd535e698cdb.
2025-03-23[AArch64][MachineCombiner] Recombine long chains of accumulation ↵Jonathan Cohen1-11/+262
instructions into a tree to increase ILP (#126060) This pattern shows up often in media libraries. The optimization should only kick in for O3. Currently only supports a single family of accumulation instructions, but can easily be expanded to support additional instructions in the future.
2025-03-13[MachineCombiner][Targets] Use Register in TII genAlternativeCodeSequence ↵Craig Topper1-2/+2
interface. NFC (#131272)
2025-01-13[aarch64][win] Update Called Globals info when updating Call Site info (#122762)Daniel Paoliello1-3/+3
Fixes the "use after poison" issue introduced by #121516 (see <https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>). The root cause of this issue is that #121516 introduced "Called Global" information for call instructions modeling how "Call Site" info is stored in the machine function, HOWEVER it didn't copy the copy/move/erase operations for call site information. The fix is to rename and update the existing copy/move/erase functions so they also take care of Called Global info.
2025-01-11[AMDGPU] Add target hook to isGlobalMemoryObject (#112781)Austin Kerbow1-0/+5
We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.
2024-08-27[TII][RISCV] Add renamable bit to copyPhysReg (#91179)Piyou Chen1-1/+3
The renamable flag is useful during MachineCopyPropagation but renamable flag will be dropped after lowerCopy in some case. This patch introduces extra arguments to pass the renamable flag to copyPhysReg.
2024-07-24MachineOutliner: Use PM to query MachineModuleInfo (#99688)Matt Arsenault1-4/+6
Avoid getting this from the MachineFunction
2024-07-01[llvm][CodeGen] Avoid 'raw_string_ostream::str' (NFC) (#97318)Youngsuk Kim1-2/+2
Since `raw_string_ostream` doesn't own the string buffer, it is desirable (in terms of memory safety) for users to directly reference the string buffer rather than use `raw_string_ostream::str()`. Work towards TODO comment to remove `raw_string_ostream::str()`.
2024-04-23[CodeGen][TII] Allow reassociation on custom operand indices (#88306)Min-Yih Hsu1-43/+100
This opens up a door for reusing reassociation optimizations on target-specific binary operations with non-standard operand list. This is effectively a NFC.
2024-04-11[clang][llvm] Remove "implicit-section-name" attribute (#87906)Arthur Eubanks1-2/+1
D33412/D33413 introduced this to support a clang pragma to set section names for a symbol depending on if it would be placed in bss/data/rodata/text, which may not be known until the backend. However, for text we know that only functions will go there, so just directly set the section in clang instead of going through a completely separate attribute. Autoupgrade the "implicit-section-name" attribute to directly setting the section on a Fuction.
2024-04-11[MachineCombiner][NFC] Split target-dependent patternsPengcheng Wang1-8/+11
We split target-dependent MachineCombiner patterns into their target folder. This makes MachineCombiner much more target-independent. Reviewers: davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc Reviewed By: topperc, mshockwave Pull Request: https://github.com/llvm/llvm-project/pull/87991
2024-03-17[CodeGen] Use LocationSize for MMO getSize (#84751)David Green1-1/+2
This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.
2024-03-06[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875)David Green1-1/+1
This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).
2023-12-05TargetInstrInfo, TargetSchedule: fix non-NFC parts of 9468de4 (#74338)Ramkumar Ramachandra1-1/+1
Follow up on a post-commit review of 9468de4 (TargetInstrInfo: make getOperandLatency return optional (NFC)) by Bjorn Pettersson to fix a couple of things that are not NFC: - std::optional<T>::operator<= returns true if the first operand is a std::nullopt and second operand is T. Fix a couple of places where we assumed it would return false. - In TargetSchedule, computeInstrCost could take another codepath, returning InstrLatency instead of DefaultDefLatency. Fix one instance not accounting for this behavior.
2023-12-04[TargetInstrInfo] update INLINEASM memoperands once (#74135)Nick Desaulniers1-22/+21
In commit b05335989239 ("[X86InstrInfo] support memfold on spillable inline asm (#70832)"), I had a last minute fix to update the memoperands. I originally did this in the parent foldInlineAsmMemOperand call, updated the mir test via update_mir_test_checks.py, but then decided to move it to the child call of foldInlineAsmMemOperand. But I forgot to rerun update_mir_test_checks.py. That last minute change caused the same memoperand to be added twice when recursion occurred (for tied operands). I happened to get lucky that trailing content omitted from the CHECK line doesn't result in test failure. But rerunning update_mir_test_checks.py on the mir test added in that commit produces updated output. This is resulting in updates to the test that: 1. conflate additions to the test in child commits with simply updating the test as it should have been when first committed. 2. look wrong because the same memoperand is specified twice (we don't deduplicate memoperands when added). Example: INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0) Fix the bug, so that in child commits, we don't have additional unrelated test changes (which would be wrong anyways) from simply running update_mir_test_checks.py. Link: #20571
2023-12-01Fix MSVC signed/unsigned mismatch warning. NFC.Simon Pilgrim1-1/+1
2023-12-01TargetInstrInfo: make getOperandLatency return optional (NFC) (#73769)Ramkumar Ramachandra1-12/+11
getOperandLatency has the following behavior: it returns -1 as a special value, negative numbers other than -1 on some target-specific overrides, or a valid non-negative latency. This behavior can be surprising, as some callers do arithmetic on these negative values. Change the interface of getOperandLatency to return a std::optional<unsigned> to prevent surprises in callers. While at it, change the interface of getInstrLatency to return unsigned instead of int. This change was inspired by a refactoring in TargetSchedModel::computeOperandLatency.
2023-11-29[X86InstrInfo] support memfold on spillable inline asm (#70832)Nick Desaulniers1-12/+21
This enables -regalloc=greedy to memfold spillable inline asm MachineOperands. Because no instruction selection framework marks MachineOperands as spillable, no language frontend can observe functional changes from this patch. That will change once instruction selection frameworks are updated. Link: https://github.com/llvm/llvm-project/issues/20571
2023-11-22[AArch64] Use the same fast math preservation for MachineCombiner ↵Craig Topper1-4/+16
reassociation as X86/PowerPC/RISCV. (#72820) Don't blindly copy the original flags from the pre-reassociated instrutions. This copied the integer poison flags which are not safe to preserve after reassociation. For the FP flags, I think we should only keep the intersection of the flags. Override setSpecialOperandAttr to do this. Fixes #72777.
2023-11-21reapply "[TargetInstrInfo] enable foldMemoryOperand for InlineAsm (#70743)" ↵Nick Desaulniers1-0/+62
(#72910) This reverts commit 42204c94ba9fcb0b4b1335e648ce140a3eef8a9d. It was accidentally backed out. #20571 #70743
2023-11-19Revert "[TargetInstrInfo] enable foldMemoryOperand for InlineAsm (#70743)"Bill Wendling1-62/+0
This reverts commit 99ee2db198d86f685bcb07a1495a7115ffc31d7e. It's causing ICEs in the ARM tests. See the comment here: https://github.com/llvm/llvm-project/commit/99ee2db198d86f685bcb07a1495a7115ffc31d7e
2023-11-17[TargetInstrInfo] enable foldMemoryOperand for InlineAsm (#70743)Nick Desaulniers1-0/+62
foldMemoryOperand looks at pairs of instructions (generally a load to virt reg then use of the virtreg, or def of a virtreg then a store) and attempts to combine them. This can reduce register pressure. A prior commit added the ability to mark such a MachineOperand as foldable. In terms of INLINEASM, this means that "rm" was used (rather than just "r") to denote that the INLINEASM may use a memory operand rather than a register operand. This effectively undoes decisions made by the instruction selection framework. Callers will be added in the register allocation frameworks. This has been tested with all of the above (which will come as follow up patches). Thanks to @topperc who suggested this at last years LLVM US Dev Meeting and @qcolombet who confirmed this was the right approach. Link: https://github.com/llvm/llvm-project/issues/20571
2023-11-03[InlineAsm] Steal a bit to denote a register is foldable (#70738)Nick Desaulniers1-0/+4
When using the inline asm constraint string "rm" (or "g"), we generally would like the compiler to choose "r", but it is permitted to choose "m" if there's register pressure. This is distinct from "r" in which the register is not permitted to be spilled to the stack. The decision of which to use must be made at some point. Currently, the instruction selection frameworks (ISELs) make the choice, and the register allocators had better be able to handle the result. Steal a bit from Storage when using register operands to disambiguate between the two cases. Add helpers/getters/setters, and print in MIR when such a register is foldable. The getter will later be used by the register allocation frameworks (and asserted by the ISELs) while the setters will be used by the instruction selection frameworks. Link: https://github.com/llvm/llvm-project/issues/20571
2023-10-27[BasicBlockSections] Apply path cloning with -basic-block-sections. (#68860)Rahman Lavaee1-3/+12
https://github.com/llvm/llvm-project/commit/28b912687900bc0a67cd61c374fce296b09963c4 introduced the path cloning format in the basic-block-sections profile. This PR validates and applies path clonings. A path cloning is valid if all of these conditions hold: 1. All bb ids in the path are mapped to existing blocks. 2. Each two consecutive bb ids in the path have a successor relationship in the CFG. 3. The path does not include a block with indirect branches, except possibly as the last block. Applying a path cloning involves cloning all blocks in the path (except the first one) and setting up their branches. Once all clonings are applied, the cluster information is used to guide block layout in the modified function.
2023-09-13reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66264)Nick Desaulniers1-1/+1
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003) This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5. Fix up build failures in targets I missed in #66003 Kept as 3 commits for reviewers to see better what's changed. Will squash when merging. - reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003) - fix all the targets I missed in #66003 - fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
2023-09-13Revert "[InlineAsm] wrap ConstraintCode in enum class NFC (#66003)"Reid Kleckner1-1/+1
This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd. Also revert the followup, "[InlineAsm] fix botched merge conflict resolution" This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1. There were SystemZ and Mips build errors, too many to fix forward.
2023-09-13[InlineAsm] wrap ConstraintCode in enum class NFC (#66003)Nick Desaulniers1-1/+1
Similar to commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC") Fix the TODOs added in commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC (#65649)")
2023-09-11[InlineAsm] refactor InlineAsm class NFC (#65649)Nick Desaulniers1-9/+8
I would like to steal one of these bits to denote whether a kind may be spilled by the register allocator or not, but I'm afraid to touch of any this code using bitwise operands. Make flags a first class type using bitfields, rather than launder data around via `unsigned`.
2023-08-31[InlineAsm] wrap Kind in enum class NFCNick Desaulniers1-1/+1
Should add some minor type safety to the use of this information, since there's quite a bit of metadata being laundered through an `unsigned`. I'm looking to potentially add more bitfields to that `unsigned`, but I find InlineAsm's big ol' bag of enum values and usage of `unsigned` confusing, type-unsafe, and un-ergonomic. These can probably be better abstracted. I think the lack of static_cast outside of InlineAsm indicates the prior code smell fixed here. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D159242
2023-08-24[CodeGen][AArch64] Don't split functions with a red zone on AArch64Daniel Hoekwater1-0/+20
Because unconditional branch relaxation on AArch64 grows the stack to spill a register, splitting a function would cause the red zone to be overwritten. Explicitly disable MFS for such functions. Differential Revision: https://reviews.llvm.org/D157127
2023-08-19[llvm] Remove redundant control flow statements (NFC)Kazu Hirata1-1/+0
2023-08-07[TII] NFCI: Simplify the interface for isTriviallyReMaterializableSander de Smalen1-1/+1
Currently `isTriviallyReMaterializable` calls `isReallyTriviallyReMaterializable` and `isReallyTriviallyReMaterializableGeneric`. The two interfaces are confusing, but there are also some real issues with this. The documentation of this function (see below) suggests that `isReallyTriviallyRematerializable` allows the target to override the default behaviour. /// For instructions with opcodes for which the M_REMATERIALIZABLE flag is /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. It however implements something different. The default behaviour is the analysis done in `isReallyTriviallyReMaterializableGeneric`, which is testing if it is safe to rematerialize the MachineInstr. The result of `isReallyTriviallyReMaterializable` is only considered if `isReallyTriviallyReMaterializableGeneric` returns `false`. That means there is no way to override the default behaviour if `isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to rematerialize, but we'd rather not). By making this a single interface, we can override the interface to do either. Reviewed By: craig.topper, nemanjai Differential Revision: https://reviews.llvm.org/D156520
2023-07-31Reapply "[CodeGen]Allow targets to use target specific COPY instructions for ↵Matt Arsenault1-3/+4
live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.
2023-07-27[CodeGen] Store call frame size in MachineBasicBlockJay Foad1-0/+16
Record the call frame size on entry to each basic block. This is usually zero except when a basic block has been split in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156113