aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-03-31[IRBuilder] Add new overload for CreateIntrinsic (#131942)Rahul Joshi1-6/+5
Add a new `CreateIntrinsic` overload with no `Types`, useful for creating calls to non-overloaded intrinsics that don't need additional mangling.
2025-02-28[AMDGPU] Cosmetic tweaks in AMDGPUAtomicOptimizer. NFC. (#129081)Jay Foad1-19/+9
Simplify iteration over the ToReplace vector, and some related cosmetic cleanups.
2025-02-24AMDGPU: Fix creating illegally typed readfirstlane in atomic optimizer (#128388)Matt Arsenault1-2/+9
We need to promote 8/16-bit cases to 32-bit. Unfortunately we are missing demanded bits optimizations on readfirstlane, so we end up emitting an and instruction on the input. I'm also surprised this pass isn't handling half or bfloat yet.
2025-01-24[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)Jeremy Morse1-1/+1
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).
2024-12-03[AMDGPU] Refine AMDGPUAtomicOptimizerImpl class. NFC. (#118302)Jay Foad1-47/+39
Use references instead of pointers for most state and common up some of the initialization between the legacy and new pass manager paths.
2024-10-18Fix typo "instrinsic" (#112899)Jay Foad1-1/+1
2024-10-17[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)Jay Foad1-5/+3
Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.
2024-10-11[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)Rahul Joshi1-12/+12
Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).
2024-09-04[AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (#107108)Jay Foad1-8/+10
Use poison for an unused input to the permlanex16 intrinsic, to improve register allocation and avoid an unnecessary v_mov instruction.
2024-08-23[AMDGPU] Remove comment outdated by #96933Jay Foad1-2/+1
2024-07-15[AMDGPU] Enable atomic optimizer for divergent i64 and double values (#96934)Vikram Hegde1-11/+30
2024-07-13[AMDGPU] Re-enable atomic optimization of uniform fadd/fsub with result (#97604)Jay Foad1-14/+14
Fix various problems to do with the first active lane of the result of optimized fp atomics, as explained in the comment. Fixes #97554
2024-07-08[AMDGPU] Fix -Wunused-variable in AMDGPUAtomicOptimizer.cpp (NFC)Jie Fu1-1/+1
/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:688:18: error: unused variable 'TyBitWidth' [-Werror,-Wunused-variable] const unsigned TyBitWidth = DL->getTypeSizeInBits(Ty); ^ 1 error generated.
2024-07-08[AMDGPU] Cleanup bitcast spam in atomic optimizer (#96933)Vikram Hegde1-84/+26
2024-07-03[AMDGPU] Disable atomic optimization of fadd/fsub with result (#96479)Jay Foad1-1/+14
An atomic fadd instruction like this should return %x: ; value at %ptr is %x %r = atomicrmw fadd ptr %ptr, float %y After atomic optimization, if %y is uniform, the result is calculated as %r = %x + * %y * +0.0. This has a couple of problems: 1. If %y is Inf or NaN, this will return NaN instead of %x. 2. If %x is -0.0 and %y is positive, this will return +0.0 instead of -0.0. Avoid these problems by disabling the "%y is uniform" path if there are any uses of the result.
2024-07-02[AMDGPU] Use nan as the identity for atomicrmw fmax/fmin (#97411)Jay Foad1-2/+5
atomicrmw fmax/fmin perform the same operation as llvm.maxnum/minnum which return the other operand if one operand is nan. This means that, in the presence of nan arguments, +/- inf is not an identity for these operations but nan is (at least if you don't care about nan payloads).
2024-06-28[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)Nikita Popov1-2/+2
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.
2024-06-26[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering ↵Vikram Hegde1-3/+3
for generic types (#92725) These are incremental changes over #89217 , with core logic being the same. This patch along with #89217 and #91190 should get us ready to enable 64 bit optimizations in atomic optimizer.
2024-06-25[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for ↵Vikram Hegde1-5/+5
generic types (#89217) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <vikhegde@amd.com>
2024-06-24Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"Stephen Tozer1-1/+1
Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24[IR][NFC] Update IRBuilder to use InsertPosition (#96497)Stephen Tozer1-1/+1
Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.
2024-06-10[RFC][AMDGPU] Remove old llvm.amdgcn.buffer.* and tbuffer intrinsics (#93801)Jay Foad1-9/+0
They have been superseded by llvm.amdgcn.raw.buffer.* and llvm.amdgcn.struct.buffer.*.
2024-05-09[AMDGPU] Build lane intrinsics in a mangling-agnostic way. NFC. (#91583)Jay Foad1-11/+11
Use the form of CreateIntrinsic that takes an explicit return type and works out the mangling based on that and the types of the arguments. The advantage is that this still works if intrinsics are changed to have type mangling, e.g. if readlane/readfirstlane/writelane are changed to work on any type.
2024-04-18[AMDGPU][AtomicOptimizer] Fix DT update for divergent values with Iterative ↵Pierre van Houtryve1-9/+20
strategy (#87605) We take the terminator from EntryBB and put it in ComputeEnd. Make sure we also move the DT edges, we previously only did it assuming a non-conditional branch. Fixes SWDEV-453943
2024-03-22[AMDGPU] Support double type in atomic optimizer. (#84307)Pravin Jagtap1-4/+7
Presently the atomic optimizer supports only 32-bit operations. Plan is to extend the atomic optimizer for 64-bit operations for compute and graphics. This patch extends support for double type for `uniform values` only. Going forward, will extend the support for divergent values. Adding support for divergent values requires extending/legalizing readfirstlane, readlane, writelane, etc ops for 64-bit operations to avoid `bitcast` noise that we have currently. --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-09-13[AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (#66082)Pravin Jagtap1-5/+10
[D156301](https://reviews.llvm.org/D156301) introduced atomic optimizations for FAdd/FSub. For FSub, reduction/scan needs to be performed using add operation (`not sub`) and memory location will be updated by reduced value using atomic sub later by only one lane. --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-09-11[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilderJeremy Morse1-1/+1
This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468
2023-08-30[AMDGPU] Support FMin/FMax in AMDGPUAtomicOptimizer.Pravin Jagtap1-0/+14
Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D157388
2023-08-30[AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.Pravin Jagtap1-66/+138
Reduction and Scan are implemented using `Iterative` and `DPP` strategy for `float` type. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D156301
2023-06-22[AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy.Pravin Jagtap1-2/+11
Atomic optimizer is turned on by default through D152649. This patch removes the usage of old command line option amdgpu-atomic-optimizations and transfer the responsibility to `amdgpu-atomic-optimizer-strategy`. We can safely remove old option when LLPC remove its all usage. Reviewed By: foad, arsenm, #amdgpu, cdevadas Differential Revision: https://reviews.llvm.org/D153007
2023-06-21[AMDGPU] Preserve dom-tree analysis in atomic optimizer.Pravin Jagtap1-4/+11
AMDGPUAtomicOptimizer updates the dominator tree whenever it modified the control flow. Therefore preserving the analysis similar to legacy PM. Reviewed By: arsenm, yassingh, #amdgpu Differential Revision: https://reviews.llvm.org/D153349
2023-06-20[AMDGPU] Use verify<domtree> instead of intra-pass asserts.Pravin Jagtap1-4/+0
Verifying dominator tree is expensive using intra-pass asserts. Asserts added during D147408 are increasing the build time of libc significantly. This change does the verification after the atomic optimizer pass and should fix the regression reported in D153232. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D153261
2023-06-09[AMDGPU] Iterative scan implementation for atomic optimizer.Pravin Jagtap1-35/+173
This patch provides an alternative implementation to DPP for Scan Computations. An alternative implementation iterates over all active lanes of Wavefront using llvm.cttz and performs the following steps: 1. Read the value that needs to be atomically incremented using llvm.amdgcn.readlane intrinsic 2. Accumulate the result. 3. Update the scan result using llvm.amdgcn.writelane intrinsic if intermediate scan results are needed later in the kernel. Reviewed By: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D147408
2023-06-05[AMDGPU] Add buffer intrinsics that take resources as pointersKrzysztof Drewniak1-0/+18
In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547
2023-05-23[BBUtils][NFC] Delete SplitBlockAndInsertIfThen with DT.Joshua Cao1-9/+12
The method is marked for deprecation. Delete the method and move all of its consumers to use the DomTreeUpdater version. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D149428
2023-04-20[NewPM][AMDGPU] Port amdgpu-atomic-optimizerPravin Jagtap1-29/+69
Reviewed By: arsenm, sameerds, gandhi21299 Differential Revision: https://reviews.llvm.org/D148628
2023-03-15[AMDGPU] Use UniformityAnalysis in AtomicOptimizerpvanhout1-9/+9
Adds & uses a new `isDivergentUse` API in UA. UniformityAnalysis now requires CycleInfo as well as the new temporal divergence API can query it. ----- Original patch that adds `isDivergentUse` by @sameerds The user of a temporally divergent value is marked as divergent in the uniformity analysis. But the same user may also have been marked divergent for other reasons, thus losing this information about temporal divergence. But some clients need to specificly check for temporal divergence. This change restores such an API, that already existed in DivergenceAnalysis. Reviewed By: sameerds, foad Differential Revision: https://reviews.llvm.org/D146018
2022-11-16AMDGPU: Create poison values instead of undefMatt Arsenault1-3/+3
These placeholders don't care about the finer points on the difference between the two.
2022-06-13[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsicJay Foad1-0/+6
Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were always enabled, and no OLD input because it always writes to every active lane. Also use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D127662
2021-09-20[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test ↵Jacob Lambert1-1/+1
commit for new contributor.
2021-03-26[AMDGPU] Use reductions instead of scans in the atomic optimizerJay Foad1-10/+60
If the result of an atomic operation is not used then it can be more efficient to build a reduction across all lanes instead of a scan. Do this for GFX10, where the permlanex16 instruction makes it viable. For wave64 this saves a couple of dpp operations. For wave32 it saves one readlane (which are generally bad for performance) and one dpp operation. Differential Revision: https://reviews.llvm.org/D98953
2021-03-19[AMDGPU] Remove some redundant code. NFC.Jay Foad1-19/+2
This is redundant because we have already checked that we can't handle divergent 64-bit atomic operands.
2021-03-19[AMDGPU] Skip building some IR if it won't be used. NFC.Jay Foad1-2/+4
2021-03-19[AMDGPU] Sink Intrinsic::getDeclaration calls to where they are used. NFC.Jay Foad1-12/+10
2021-03-03[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwmPiotr Sobczak1-2/+3
* Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257
2021-01-20[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargetsdfukalov1-1/+1
... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036
2021-01-07[NFC][AMDGPU] Reduce include files dependency.dfukalov1-1/+2
Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813
2020-09-30[AMDGPU] Do not generate mul with 1 in AMDGPU Atomic OptimizerMirko Brkusanin1-4/+9
Check if operand of mul is constant value of one for certain atomic instructions in order to avoid making unnecessary instructions when -amdgpu-atomic-optimizer is present. Differential Revision: https://reviews.llvm.org/D88315
2020-05-29[SVE] Eliminate calls to default-false VectorType::get() from AMDGPUChristopher Tetreault1-1/+1
Reviewers: efriedma, david-arm, fpetrogalli, arsenm Reviewed By: david-arm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, tschuett, hiraditya, rkruppe, psnobl, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80328
2020-03-31[AMDGPU] New llvm.amdgcn.ballot intrinsicSebastian Neubauer1-3/+2
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes. This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period. Use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D65088