aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
AgeCommit message (Collapse)AuthorFilesLines
18 hours[AMDGPU][LowerBufferFatPointers] Erase dead ptr(7) intrinsics (#160798)Krzysztof Drewniak1-1/+3
Fix a crash that would arise when intrinsics like llvm.masked.load.T.p7 were left in the module when AMDGPULowerBufferFatPointers was applied and so a captures(none) annotation would be applied to a non-pointer value, triggering a verifier failure. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>
13 days[AMDGPU] Prevent re-visits in LowerBufferFatPointers (#159168)Krzysztof Drewniak1-0/+6
Fixes https://github.com/iree-org/iree/issues/22001 The visitor in SplitPtrStructs would re-visit instructions if an instruction earlier in program order caused a recursive visit() call via getPtrParts(). This would cause instructions to be processed multiple times. As a consequence of this, PHI nodes could be added to the Conditionals array multiple times, which would to a conditinoal that was already simplified being processed multiple times. After the code moved to InstSimplifyFolder, this re-processing, combined with more agressive simplifications, would lead to an attempt to replace an instruction with itself, causing an assertion failure and crash. This commit resolves the issue and adds the reduced form of the crashing input as a test.
2025-08-22[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769)Ivan Kosarev1-0/+2
Saves around 125-210 MB of compilation memory usage per source for roughly one third of our backend sources, ~60 MB on average.
2025-08-18[AMDGPU][LowerBufferFatPointers] Fix lack of rewrite when loading/storing ↵Krzysztof Drewniak1-1/+5
null (#154128) Fixes #154056. The fat buffer lowering pass was erroniously detecting that it did not need to run on functions that only load/store to the null constant (or other such constants). We thought this would be covered by specializing constants out to instructions, but that doesn't account foc trivial constants like null. Therefore, we check the operands of instructions for buffer fat pointers in order to find such constants and ensure the pass runs. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2025-08-12[RemoveDIs][AMDGPU] Replace defunct getAssignmentMarkers call (#153212)Orlando Cazalet-Hyams1-2/+2
Not quite NFC as it looks like the original intrinsic-handling code never got updated to use records. This was never caught because that code wasn't tested. I've adjusted an existing test so the behaviour is now covered.
2025-08-09[AMDGPULowerBufferFatPointers] Handle ptrtoaddr by extending the offsetAlexander Richardson1-0/+16
Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139413
2025-07-21[DebugInfo] Remove intrinsic-flavours of findDbgUsers (#149816)Jeremy Morse1-2/+1
This is one of the final remaining debug-intrinsic specific codepaths out there, and pieces of cross-LLVM infrastructure to do with debug intrinsics.
2025-07-21[DebugInfo][AMDGPU] Convert a debug-intrinsic method to debug records (#149505)Jeremy Morse1-8/+7
It appears this wasn't handled in the initial migration a year ago, seemingly because it didn't lead to any test failures. Find and interpret debug records in the same way the original code handled intrinsics. Note that we drop a call to copyMetadata: debug records can't carry additional metadata like instructions, nothing relies on this in AMDGPU AFAIUI.
2025-06-21AMDGPU: Use reportFatalUsageError in AMDGPULowerBufferFatPointers (#145132)Matt Arsenault1-21/+26
2025-06-12[DebugInfo][RemoveDIs] Delete debug-info-format flag (#143746)Jeremy Morse1-1/+0
This flag was used to let us incrementally introduce debug records into LLVM, however everything is now using records. It serves no purpose now, so delete it.
2025-05-30AMDGPU: Start using LLVMContext errors in buffer fat pointer lowering (#142014)Matt Arsenault1-7/+16
Avoid using report_fatal_error. Many more uses that should be converted in the pass remain.
2025-05-28Reland "Add macro to suppress -Wunnecessary-virtual-specifier" (#141091)Devon Loehr1-1/+1
This fixes #139614 on non-clang compilers by moving `__has_warning` completely inside the `#if defined(__clang__)` block. This prevents a parse failure from compilers which don't recognize `__has_warning`. Original description: Followup to #138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM.
2025-05-24[AMDGPU] Remove unused includes (NFC) (#141376)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-21Revert "Add macro to suppress -Wunnecessary-virtual-specifier (#139614)"Philip Reames1-1/+1
This reverts commit 0954c9d487e7cb30673df9f0ac125f71320d2936. It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).
2025-05-21Add macro to suppress -Wunnecessary-virtual-specifier (#139614)Devon Loehr1-1/+1
Followup to #138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM.
2025-05-20[AMDGPU][LowerBufferFatPointers] Handle addrspacecast null to p7 (#140775)Krzysztof Drewniak1-4/+32
Some application code operating on generic pointers (that then gete initialized to buffer fat pointers) may perform tests against nullptr. After address space inference, this results in comparisons against `addrspacecast (ptr null to ptr addrspace(7))`, which were crashing. However, while general casts to ptr addrspace(7) from generic pointers aren't supposted, it is possible to cast null pointers to the all-zerose bufer resource and 0 offset, which this patch adds. It also adds a TODO for casting _out_ of buffer resources, which isn't implemented here but could be.
2025-05-19[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425)Krzysztof Drewniak1-0/+20
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the same API and "gather from a pointer to LDS" is something of an abstract operation. This commit adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang.
2025-04-30Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum ↵Jonathan Thackray1-0/+10
instructions (#137701) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.
2025-04-29[NFC][AMDGPU] Drop recursive types in LowerBufferFatPointers (#137735)Krzysztof Drewniak1-20/+4
Now that IRMover and the rest of LLVM don't allow recursive types, drop support for them from the clone of the IRMover code used when lowering buffer fat pointer operations.
2025-04-28Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum ↵Jonathan Thackray1-10/+0
instructions" (#137657) Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4
2025-04-28[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions ↵Jonathan Thackray1-0/+10
(#136759) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.
2025-04-24[AMDGPU] Use variadic isa<>. NFC. (#137016)Jay Foad1-2/+2
2025-04-19[AMDGPU] Construct SmallVector with iterator ranges (NFC) (#136415)Kazu Hirata1-3/+2
2025-04-08Revert "[AMDGPU] Add buffer.fat.ptr.load.lds intrinsic wrapping raw rsrc ↵Krzysztof Drewniak1-21/+0
version (#133015)" (#134871) This reverts commit d1a05721172272f7aab685b56d99e86814a15bff. There was further discussion on the PR about whether the intinsics should exist in this form.
2025-04-07[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)Rahul Joshi1-4/+1
- Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767
2025-04-07[AMDGPU] Add buffer.fat.ptr.load.lds intrinsic wrapping raw rsrc version ↵Krzysztof Drewniak1-0/+21
(#133015) Add a buffer_fat_ptr_load_lds intrinsic, by analogy with global_load_lds, which enables using `ptr addrspace(7)` to set the rsrc and offset arguments to raw_ptr_buffer_load_lds.
2025-04-03[AMDGPULowerBufferFatPointers] Use InstSimplifyFolder during rewrites (#134137)Krzysztof Drewniak1-22/+32
This PR updates AMDGPULowerBufferFatPointers to use the InstSimplifyFolder when creating IR during buffer fat pointer lowering. This shouldn't cause any large functional changes and might improve the quality of the generated code.
2025-03-12[AMDGPU] Change placeholder from `undef` to `poison` (#130858)Pedro Lobo1-1/+1
Replace `undef` debug info with `poison`.
2025-02-27Reapply "[AMDGPU] Handle memcpy()-like ops in LowerBufferFatPointers ↵Krzysztof Drewniak1-24/+98
(#126621)" (#129078) This reverts commit 1559a65efaf327f9c72e14d4bb1834f076e7fc20. Fixed test (I suspect broken by unrelated change in the merge)
2025-02-26Revert "[AMDGPU] Handle memcpy()-like ops in LowerBufferFatPointers (#126621)"Kazu Hirata1-98/+24
This reverts commit 469757efafebdd5772d993fca4dc0dfa7cbda17c. Multiple buildbot failures have been reported: https://github.com/llvm/llvm-project/pull/126621
2025-02-26[AMDGPU] Handle memcpy()-like ops in LowerBufferFatPointers (#126621)Krzysztof Drewniak1-24/+98
Since LowerBufferFatPointers runs before PreISelIntrinsicLowering, which normally handles unsupported memcpy()s,, and since you can't have a `noalias {ptr addrspace(8), i32}` becasue it crashes later passes, manually expand memcpy()s involving buffer fat pointers to loops. Additionally, though they're unlikely to be used, this commit adds support for memset(). This commit doesn't implement writing direct-to-LDS loads as the intrinsics, but leaves the option in the future.
2025-02-18[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828)Krzysztof Drewniak1-0/+20
Attempting to pass a `ptr addrspace(7)` to functions that take `ptr` arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7) to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP operations on buffer resources, which can't be GEP'd. (However, note that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr is legal - it's just an effective address computation) To resolve this problem, and thus prevent illegal `getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this commit extends amdgcn.make.buffer.rsrc to also be variadic in its result type, auto-upgrading old manglings. The logic for handling a make.buffer.rsrc in instruction selection remains untouched and expects the output type to be a ptr addrspace(8), as does the Clang lowering for its builtin (the pointer-to-pointer version might want a different name in clang). LowerBufferFatPointers has been updated to lower amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* . This'll also make exposing buffer fat pointers in Clang easier, since you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.
2025-02-11[LowerBufferFatPointers] Fix support for GEP T, p7, <N x T> idxs (#126126)Krzysztof Drewniak1-4/+15
The lowering for GEP didn't properly support the case where the pointer argument was being implicitly broadcast by a vector of indices. Fix that. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-01-20Reapply "[AMDGPU] Handle natively unsupported types in addrspace(7) ↵Krzysztof Drewniak1-3/+564
lowering" (#123660) (#123657) This reverts commit 64749fb01538fba2b56d9850497d5f3a626cabc2. Adds a constructor to VecSlice to address the failure
2025-01-20Revert "[AMDGPU] Handle natively unsupported types in addrspace(7) lowering" ↵Krzysztof Drewniak1-562/+3
(#123657) Reverts llvm/llvm-project#110572 Seem to have broken a buildbot, not sure why https://lab.llvm.org/buildbot/#/builders/108/builds/8346
2025-01-20[AMDGPU] Handle natively unsupported types in addrspace(7) lowering (#110572)Krzysztof Drewniak1-3/+562
The current lowering for ptr addrspace(7) assumed that the instruction selector can handle arbtrary LLVM types, which is not the case. Code generation can't deal with - Values that aren't 8, 16, 32, 64, 96, or 128 bits long - Aggregates (this commit only handles arrays of scalars, more may come) - Vectors of more than one byte - 3-word values that aren't a vector of 3 32-bit values (for axample, a <6 x half>) This commit adds a buffer contents type legalizer that adds the needed bitcasts, zero-extensions, and splits into subcompnents needed to convert a load or store operation into one that can be successfully lowered through code generation. In the long run, some of the involved bitcasts (though potentially not the buffer operation splitting) ought to be handled by the instruction legalizer, but SelectionDAG makes this difficult. It also takes advantage of the new `nuw` flag on `getelementptr` when lowering GEPs to offset additions. We don't currently plumb through `nsw` on GEPs since that should likely be a separate change and would require declaring what we mean by "the address" in the context of the GEP guarantees.
2025-01-14[AMDGPULowerBufferFatPointers] Use typeIncompatible() (#122902)Nikita Popov1-10/+6
Use typeIncompatible() to drop attributes incompatible with the new argument/return type, instead of keeping a custom list.
2025-01-14[AMDGPU] Handle nontemporal and amdgpu.last.use metadata in ↵Acim Maravic1-12/+0
amdgpu-lower-buffer-fat-pointers (#120139)
2024-11-06[AMDGPU] Support `nuw` and `nusw` in buffer fat pointer lowering (#115039)Krzysztof Drewniak1-2/+3
This commit usis the `nuw` flag on `getelemnetptr` to set the `nuw` flag on buffer offset additions, and also moves from `inbounds` to the looser `nusw` for the existing case.
2024-09-06[AMDGPU] Work around a warningKazu Hirata1-0/+3
This patch works around: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:1101:13: error: enumeration values 'USubCond' and 'USubSat' not handled in switch [-Werror,-Wswitch] I've notified the author in #105568.
2024-07-22[AMDGPU] Add intrinsic for raw atomic buffer loads (#97707)Jessica Del1-2/+3
Upstream the intrinsics `llvm.amdgcn.raw.atomic.buffer.load` and `llvm.amdgcn.raw.atomic.ptr.buffer.load`. These additional intrinsics mark atomic buffer loads as atomic to LLVM by removing the `IntrReadMem` attribute. Otherwise, it could hoist these intrinsics out of loops in cases where LLVM marks them as invariant. That can cause issues such as infinite loops. Continuation of https://reviews.llvm.org/D138786 with the additional use in the fat buffer lowering, more test cases and the additional ptr versions of these intrinsics. --------- Co-authored-by: rtayl <> Co-authored-by: Jay Foad <jay.foad@amd.com> Co-authored-by: Mariusz Sikora <mariusz.sikora@amd.com>
2024-07-16[AMDGPU] Use member initializers. NFC.Jay Foad1-2/+2
2024-06-27[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)Nikita Popov1-4/+4
This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.
2024-06-17[AMDGPULowerBufferFatPointers] Expand const exprs using fat pointers (#95558)Nikita Popov1-157/+45
Expand all constant expressions that use fat pointers upfront, so that the rewriting logic only has to deal with instructions and not the constant expression variants as well. My primary motivation is to remove the creation of illegal constant expressions (mul and shl) from this pass, but this also cuts down quite a bit on the amount of duplicate logic.
2024-06-14[AMDGPULowerBufferFatPointers] Fix offset-only ptrtoint (#95543)Nikita Popov1-14/+16
For ptrtoint that truncates to the offset only, the expansion generated a shift by the bit width, which is poison. Instead, we should return the offset directly. (The same problem exists for the constant expression case, but I plan to address that separately, and more comprehensively.)
2024-06-14[AMDGPULowerBufferFatPointers] Don't try to preserve flags for constant ↵Nikita Popov1-13/+6
expressions We expect all of these ConstantExpr ctors to fold away, don't try to preserve flags, especially as the flags are not correct.
2024-06-12[AMDGPULowerBufferFatPointers] Restore zero offset special caseNikita Popov1-2/+3
OffAccum will never be nullptr now, instead check for a zero constant.
2024-06-12[AMDGPULowerBufferFatPointers] Simplify and fix GEP offset emission (#95115)Nikita Popov1-41/+9
Use emitGEPOffset() to emit the GEP offset, which already has all the necessary logic. This also fixes the nuw flag incorrectly being set on the offset calculation, while only nsw is implied by inbounds.
2024-05-27[IR] Add getelementptr nusw and nuw flags (#90824)Nikita Popov1-1/+1
This implements the `nusw` and `nuw` flags for `getelementptr` as proposed at https://discourse.llvm.org/t/rfc-add-nusw-and-nuw-flags-for-getelementptr/78672. The three possible flags are encapsulated in the new `GEPNoWrapFlags` class. Currently this class has a ctor from bool, interpreted as the InBounds flag. This ctor should be removed in the future, as code gets migrated to handle all flags. There are a few places annotated with `TODO(gep_nowrap)`, where I've had to touch code but opted to not infer or precisely preserve the new flags, so as to keep this as NFC as possible and make sure any changes of that kind get test coverage when they are made.
2024-04-04AMDGPULowerBufferFatPointers.cpp - fix Wunused-variable warning. NFC.Simon Pilgrim1-1/+1