aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-08-22[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769)Ivan Kosarev1-0/+2
Saves around 125-210 MB of compilation memory usage per source for roughly one third of our backend sources, ~60 MB on average.
2025-07-31[AMDGPU] Remove `UnsafeFPMath` uses (#151079)paperchalice1-15/+8
Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to clang and the ultimate goal is to remove `resetTargetOptions` method in `TargetMachine`, see FIXME in `resetTargetOptions`. See also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-07-16[AMDGPU] Remove widen-16-bit-ops from CGP (#145483)Pierre van Houtryve1-294/+0
This was already off by default so there is no codegen change.
2025-06-03[ValueTracking] Make Depth last default arg (NFC) (#142384)Ramkumar Ramachandra1-12/+10
Having a finite Depth (or recursion limit) for computeKnownBits is very limiting, but is currently a load-bearing necessity, as all KnownBits are recomputed on each call and there is no caching. As a prerequisite for an effort to remove the recursion limit altogether, either using a clever caching technique, or writing a easily-invalidable KnownBits analysis, make the Depth argument in APIs in ValueTracking uniformly the last argument with a default value. This would aid in removing the argument when the time comes, as many callers that currently pass 0 explicitly are now updated to omit the argument altogether.
2025-06-02AMDGPUCodeGenPrepare.cpp - fix MSVC operator precedence warning. NFC.Simon Pilgrim1-2/+2
2025-05-29AMDGPU: Handle other fmin flavors in fract combine (#141987)Matt Arsenault1-5/+14
Since the input is either known not-nan, or we have explicit use code checking if the input is a nan, any of the 3 is valid to match.
2025-05-29[AMDGPU] Handle CreateBinOp not returning BinaryOperator (#137791)anjenner1-1/+4
AMDGPUCodeGenPrepareImpl::visitBinaryOperator() calls Builder.CreateBinOp() and casts the resulting Value as a BinaryOperator without checking, leading to an assert failure in a case found by fuzzing. In this case, the operands are constant and CreateBinOp does constant folding so returns a Constant instead of a BinaryOperator.
2025-05-24[AMDGPU] Remove unused includes (NFC) (#141376)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-16[AMDGPU] Do not promote uniform i16 operations to i32 in CGP (#140208)Pierre van Houtryve1-4/+4
For the majority of cases, this is a neutral or positive change. There are even testcases that greatly benefit from it, but some regressions are possible. There is #140040 for GlobalISel that'd need to be fixed but it's only a one instruction regression and I think it can be fixed later. Solves #64591
2025-05-02[IRBuilder] Add versions of createInsertVector/createExtractVector that take ↵Craig Topper1-2/+1
a uint64_t index. (#138324) Most callers want a constant index. Instead of making every caller create a ConstantInt, we can do it in IRBuilder. This is similar to createInsertElement/createExtractElement.
2025-05-02[AMDGPU] Check for nonnull loads feeding addrspacecast (#138184)Jay Foad1-0/+5
Handle nonnull loads just like nonnull arguments when checking for addrspacecasts that are known never null.
2025-04-24[AMDGPU] Use variadic isa<>. NFC. (#137016)Jay Foad1-1/+1
2025-04-17Re apply 130577 narrow math for and operand (#133896)Shoreshen1-0/+77
Re-apply https://github.com/llvm/llvm-project/pull/130577 Which is reverted in https://github.com/llvm/llvm-project/pull/133880 The old application failed in address sanitizer due to `tryNarrowMathIfNoOverflow` was called after `I.eraseFromParent();` in `AMDGPUCodeGenPrepareImpl::visitBinaryOperator`, it create a use after free failure. To fix this, `tryNarrowMathIfNoOverflow` will be called before and directly return if `tryNarrowMathIfNoOverflow` result in true.
2025-04-07[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)Rahul Joshi1-3/+1
- Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767
2025-04-01Revert "[AMDGPU][CodeGenPrepare] Narrow 64 bit math to 32 bit if profitable" ↵Shoreshen1-84/+0
(#133880) Reverts llvm/llvm-project#130577
2025-04-01[AMDGPU][CodeGenPrepare] Narrow 64 bit math to 32 bit if profitable (#130577)Shoreshen1-0/+84
For Add, Sub, Mul with Int64 type, if profitable, then do: 1. Trunc operands to Int32 type 2. Apply 32 bit Add/Sub/Mul 3. Zext to Int64 type
2025-03-31[IRBuilder] Add new overload for CreateIntrinsic (#131942)Rahul Joshi1-1/+1
Add a new `CreateIntrinsic` overload with no `Types`, useful for creating calls to non-overloaded intrinsics that don't need additional mangling.
2025-03-28[Analysis][NFC] Extract KnownFPClass (#133457)Tim Gymnich1-0/+1
- extract KnownFPClass for future use inside of GISelKnownBits --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-03-03[AMDGPU] Simplify conditional expressions. NFC. (#129228)Jay Foad1-2/+2
Simplfy `cond ? val : false` to `cond && val` and similar.
2025-01-09[AMDGPU] Rework getDivNumBits API (#119768)choikwa1-24/+24
Rework involves below: - Return unsigned value, the number of div/rem bits actually needed. - Change from AtLeast(SignBits) to MaxDivBits hint. - Use MaxDivBits hint for unsigned case. - Remove unnecessary second early exit. Mostly NFC changes.
2025-01-07[AMDGPU] Calculate getDivNumBits' AtLeast using bitwidth (#121758)choikwa1-1/+2
Previously in shrinkDivRem64, it used fixed value 32 for AtLeast which meant that <64bit divisions would be rejected from shrinking since logic depended only on number of sign bits. I.e. 'idiv i48 %0, %1' would return 24 for number of sign bits if %0,%1 both had 24 division bits, and was rejected.
2024-12-13PatternMatch: migrate to CmpPredicate (#118534)Ramkumar Ramachandra1-1/+1
With the introduction of CmpPredicate in 51a895a (IR: introduce struct with CmpInst::Predicate and samesign), PatternMatch is one of the first key pieces of infrastructure that must be updated to match a CmpInst respecting samesign information. Implement this change to Cmp-matchers. This is a preparatory step in migrating the codebase over to CmpPredicate. Since we no functional changes are desired at this stage, we have chosen not to migrate CmpPredicate::operator==(CmpPredicate) calls to use CmpPredicate::getMatching(), as that would have visible impact on tests that are not yet written: instead, we call CmpPredicate::operator==(Predicate), preserving the old behavior, while also inserting a few FIXME comments for follow-ups.
2024-12-12 Reapply [AMDGPU] prevent shrinking udiv/urem if either operand exceeds ↵choikwa1-10/+29
signed max (#119325) This reverts commit 254d206ee2a337cb38ba347c896f7c6a14c7f218. +Added a fix in ExpandDivRem24 to disqualify if DivNumBits exceed 24. Original commit & msg: ce6e955ac374f2b86cbbb73b2f32174dffd85f25. Handle signed and unsigned path differently in getDivNumBits. Using computeKnownBits, this rejects shrinking unsigned div/rem if operands exceed signed max since we know NumSignBits will be always 0.
2024-12-09Revert "Reapply "[AMDGPU] prevent shrinking udiv/urem if either operand is ↵Joseph Huber1-22/+6
in… (#118928)" This reverts commit 509893b58ff444a6f080946bd368e9bde7668f13. This broke the libc build again https://lab.llvm.org/buildbot/#/builders/73/builds/9787.
2024-12-06Reapply "[AMDGPU] prevent shrinking udiv/urem if either operand is in… ↵choikwa1-6/+22
(#118928) … (SignedMax,UnsignedMax] (#116733)" This reverts commit 905e831f8c8341e53e7e3adc57fd20b8e08eb999. Handle signed and unsigned path differently in getDivNumBits. Using computeKnownBits, this rejects shrinking unsigned div/rem if operands exceed signed max since we know NumSignBits will be always 0. Rebased and re-attempt after first one was reverted due to unrelated failure in LibC (should be fixed by now I'm told).
2024-12-03[AMDGPU] Refine AMDGPUCodeGenPrepareImpl class. NFC. (#118461)Jay Foad1-107/+91
Use references instead of pointers for most state, initialize it all in the constructor, and common up some of the initialization between the legacy and new pass manager paths.
2024-11-28[AMDGPU] Preserve all analyses if nothing changed (#117994)Jay Foad1-1/+3
2024-11-20Revert "[AMDGPU] prevent shrinking udiv/urem if either operand is in ↵Joseph Huber1-29/+13
(SignedMax,UnsignedMax] (#116733)" This reverts commit b8e1d4dbea8905e48d51a70bf75cb8fababa4a60. Causes failures on the `libc` test suite https://lab.llvm.org/buildbot/#/builders/73/builds/8871
2024-11-20[AMDGPU] prevent shrinking udiv/urem if either operand is in ↵choikwa1-13/+29
(SignedMax,UnsignedMax] (#116733) Do this by using ComputeKnownBits and checking for !isNonNegative and isUnsigned. This rejects shrinking unsigned div/rem if operands exceed smax_bitwidth since we know NumSignBits will be always 0.
2024-10-17[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)Jay Foad1-9/+5
Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.
2024-10-11[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)Rahul Joshi1-7/+8
Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).
2024-07-02AMDGPU: Fix assert from wrong address space size assumption (#97267)Matt Arsenault1-1/+8
This was assuming the source address space was at least as large as the destination of the cast. I'm not sure why this was casting to begin with; the assumption seems to be the source address space from the root addrspacecast matches the underlying object so directly check that. Fixes #97457
2024-06-24Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"Stephen Tozer1-1/+1
Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24[IR][NFC] Update IRBuilder to use InsertPosition (#96497)Stephen Tozer1-1/+1
Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.
2024-03-27[AMDGPU] Fix missing `IsExact` flag when expanding vector binary operator ↵Shilei Tian1-0/+3
(#86712)
2024-03-19[AMDGCN] Use ZExt when handling indices in insertment element (#85718)Peter Rong1-2/+2
When i1 true is used as an index, SExt extends it to i32 -1. This would cause BitVector to overflow. The language manual have specified that the index shall be treated as an unsigned number, this patch fixes that. (https://llvm.org/docs/LangRef.html#insertelement-instruction) This patch fixes #85717 --------- Signed-off-by: Peter Rong <PeterRong96@gmail.com>
2024-03-18[RemoveDIs] Use getFirstNonPHIIt to fix crash #85472 (#85618)Orlando Cazalet-Hyams1-1/+1
2024-03-01[AMDGPU] Improve detection of non-null addrspacecast operands (#82311)Pierre van Houtryve1-0/+73
Use IR analysis to infer when an addrspacecast operand is nonnull, then lower it to an intrinsic that the DAG can use to skip the null check. I did this using an intrinsic as it's non-intrusive. An alternative would have been to allow something like `!nonnull` on `addrspacecast` then lower that to a custom opcode (or add an operand to the addrspacecast MIR/DAG opcodes), but it's a lot of boilerplate for just one target's use case IMO. I'm hoping that when we switch to GISel that we can move all this logic to the MIR level without losing info, but currently the DAG doesn't see enough so we need to act in CGP. Fixes: SWDEV-316445
2024-02-06[AMDGPU] Use correct number of bits needed for div/rem shrinking (#80622)choikwa1-9/+12
There was an error where dividend of type i64 and actual used number of bits of 32 fell into path that assumes only 24 bits being used. Check that AtLeast field is used correctly when using computeNumSignBits and add necessary extend/trunc for 32 bits path. Regolden and update testcases. @jrbyrnes @bcahoon @arsenm @rampitec
2024-02-06[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family ↵Yingwei Zheng1-1/+2
(#80657) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The motivation of this patch is to compute known fpclass with `DomConditionCache`, which was introduced by https://github.com/llvm/llvm-project/pull/73662. With `DomConditionCache`, we can do more optimization with context-sensitive information. Example (extracted from [fmt/format.h](https://github.com/fmtlib/fmt/blob/e17bc67547a66cdd378ca6a90c56b865d30d6168/include/fmt/format.h#L3555-L3566)): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the signbit of `%value` is always zero. Then the fabs can be eliminated.
2023-12-13[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)Piotr Sobczak1-2/+2
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-11-30[AMDGPU] Don't create mulhi_24 in CGP (#72983)Pierre van Houtryve1-47/+13
Instead, create a mul24 with a 64 bit result and let ISel take care of it. This allows patterns to simply match mul24 even for 64-bit muls instead of having to match both mul/mulhi and a buildvector/bitconvert/etc.
2023-09-13AMDGPU: Avoid creating vector extracts if we aren't going to do anythingMatt Arsenault1-5/+4
Try to avoid expensive checks failures from reporting no changes when some dead instructions were introduced.
2023-09-12AMDGPU: Correctly lower llvm.sqrt.f32Matt Arsenault1-9/+119
Make codegen emit correctly rounded sqrt by default. Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner. https://reviews.llvm.org/D158129
2023-08-30AMDGPU: Fix sqrt fast math flags spreading to fdiv fast math flagsMatt Arsenault1-4/+3
This was working around the lack of operator| on FastMathFlags. We have that now which revealed the bug.
2023-08-23AMDGPU: Permit more rsq formation in AMDGPUCodeGenPrepareMatt Arsenault1-17/+17
We were basing the defer the fast case to codegen based on the fdiv itself, and not looking for a foldable sqrt input. https://reviews.llvm.org/D158127
2023-08-11[AMDGPU] Clear BreakPhiNodesCache in-between functionspvanhout1-0/+1
Otherwise stale pointers pollute the cache and when a dead PHI's memory is reused for another PHI, we can get a false positive hit in the cache. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157711
2023-08-03[AMDGPU] Break Large PHIs: Take whole PHI chains into accountpvanhout1-45/+77
Previous heuristics had a big flaw: they only looked at single PHI at a time, and didn't take into account the whole "chain". The concept of "chain" is important because if we only break a chain partially, we risk forcing regalloc to reserve twice as many registers for that vector. We also risk adding a lot of copies that shouldn't be there and can inhibit backend optimizations. The solution I found is to consider the whole "PHI chain" when looking at PHI. That is, we recursively look at the PHI's incoming value & users for other PHIs, then make a decision about the chain as a whole. The currrent threshold requires that at least `ceil(chain size * (2/3))` PHIs have at least one interesting incoming value. In simple terms, two-thirds (rounded up) of the PHIs should be breakable. This seems to work well. A lower threshold such as 50% is too aggressive because chains can often have 7 or 9 PHIs, and breaking 3+ or 4+ PHIs in those case often causes performance issue. Fixes SWDEV-409648, SWDEV-398393, SWDEV-413487 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156414
2023-07-21[AMDGPU] Fix an unused variable warningKazu Hirata1-2/+1
This patch fixes: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1006:9: error: unused variable 'Ty' [-Werror,-Wunused-variable]
2023-07-21AMDGPU: Fix variables only used in assertsMatt Arsenault1-4/+2