aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-08-22[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769)Ivan Kosarev1-0/+2
Saves around 125-210 MB of compilation memory usage per source for roughly one third of our backend sources, ~60 MB on average.
2025-08-20[AMDGPU] Upstream the Support for array of named barriers (#154604)Gang Chen1-6/+9
2025-08-04[llvm] using wrapper llvm::sort(nfc) (#151000)Austin1-1/+1
using wrapper llvm::sort(nfc)
2025-06-21AMDGPU: Use reportFatalUsageError in AMDGPULowerModuleLDS (#145130)Matt Arsenault1-4/+5
2025-05-24[AMDGPU] Remove unused includes (NFC) (#141376)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-17Revert "[AMDGPU] Add flag to prevent reruns of LowerModuleLDS (#129520)"Shilei Tian1-6/+0
This reverts commit aa9f8596b01fef013ab62c20e61fc96d165f60f7 because it made some assumptions that may not be valid.
2025-05-15[AMDGPU] Add flag to prevent reruns of LowerModuleLDS (#129520)Pierre van Houtryve1-0/+6
FullLTO has to run this early before module splitting occurs otherwise module splitting won't work as expected. There was a targeted fix for fortran on another branch that disables the LTO run but that'd break full LTO module splitting entirely. Test changes are due to metadata indexes shifting. See #122891
2025-05-03[llvm] Use *(Set|Map)::contains (NFC) (#138431)Kazu Hirata1-1/+1
2025-04-28[AMDGPU] Correctly merge noalias scopes during lowering of LDS data. (#131664)Sirish Pande1-1/+31
Currently, if there is already noalias metadata present on loads and stores, lower module lds pass is generating a more conservative aliasing set. This results in inhibiting scheduling intrinsics that would have otherwise generated a better pipelined instruction. The fix is not to always intersect already existing noalias metadata with noalias created for lowering of LDS. But to intersect only if noalias scopes are from the same domain, otherwise concatenate exising noalias sets with LDS noalias. There a few patches that have come for scopedAA in the past. Following three should be enough background information. https://reviews.llvm.org/D91576 https://reviews.llvm.org/D108315 https://reviews.llvm.org/D110049 Essentially, after a pass that might change aliasing info, one should check if that pass results in change number of MayAlias or ModRef using the following: `opt -S -aa-pipeline=basic-aa,scoped-noalias-aa -passes=aa-eval -evaluate-aa-metadata -print-all-alias-modref-info -disable-output`
2025-04-07[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)Rahul Joshi1-4/+2
- Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767
2025-03-31[IRBuilder] Add new overload for CreateIntrinsic (#131942)Rahul Joshi1-2/+1
Add a new `CreateIntrinsic` overload with no `Types`, useful for creating calls to non-overloaded intrinsics that don't need additional mangling.
2024-11-06[AMDGPU] Fix a warningKazu Hirata1-2/+1
This patch fixes: llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp:1031:17: error: unused variable 'F' [-Werror,-Wunused-variable]
2024-11-06[AMDGPU] modify named barrier builtins and intrinsics (#114550)Gang Chen1-0/+124
Use a local pointer type to represent the named barrier in builtin and intrinsic. This makes the definitions more user friendly bacause they do not need to worry about the hardware ID assignment. Also this approach is more like the other popular GPU programming language. Named barriers should be represented as global variables of addrspace(3) in LLVM-IR. Compiler assigns the special LDS offsets for those variables during AMDGPULowerModuleLDS pass. Those addresses are converted to hw barrier ID during instruction selection. The rest of the instruction-selection changes are primarily due to the intrinsic-definition changes.
2024-10-17[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)Jay Foad1-4/+2
Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.
2024-10-11[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)Rahul Joshi1-4/+4
Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-03[AMDGPU] Qualify auto. NFC. (#110878)Jay Foad1-4/+4
Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)
2024-09-04[AMDGPU][LDS] Fix dynamic LDS interaction with "amdgpu-no-lds-kernel-id" ↵Juan Manuel Martinez Caamaño1-7/+8
(#107092) Dynamic lds and Table lds both use the amdgpu_lds_kernel_id intrinsic. Kernels and functons that make an indirect use of this should not have the "amdgpu-no-lds-kernel-id" attribute. For the later, this was done. For the dynamic lds case, this was missing. This patch fixes it.
2024-08-28[AMDGPU] Don't realign already allocated LDS. Point fix for 106412 (#106421)Jon Chesterfield1-0/+5
Fixes 106412. The logic that skips the pass on already-lowered variables doesn't cover the path that increases alignment of variables. If a variable is allocated at 24 and then given 16 byte alignment, the backend notices and fatal-errors on the inconsistency.
2024-08-20[AMDGPU] Move AMDGPUMemoryUtils out of Utils. NFC. (#104930)Jay Foad1-1/+1
It is only used by CodeGen so does not need to be shared with the assembler/disassembler.
2024-07-17[AMDGPU] Use range-based for loops. NFC. (#99047)Jay Foad1-7/+6
2024-07-16[NFC] Fix typos (#98454)Akshat Oke1-4/+4
Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>
2024-06-24Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"Stephen Tozer1-1/+1
Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24[IR][NFC] Update IRBuilder to use InsertPosition (#96497)Stephen Tozer1-1/+1
Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.
2024-06-06[AMDGPU] Update removeFnAttrFromReachable to accept array of Fn Attrs. (#94188)Chaitanya1-1/+1
This PR updates removeFnAttrFromReachable in AMDGPUMemoryUtils to accept array of function attributes as argument. Helps to remove multiple attributes in one CallGraph walk.
2024-05-20[AMDGPU] Use removeFnAttrFromReachable in lower-module-lds pass. (#92686)Chaitanya1-43/+1
2024-05-10[AMDGPU] Move LDS utilities from amdgpu-lower-module-lds pass to ↵Chaitanya1-185/+1
AMDGPUMemoryUtils (#88002) This moves some of the utility methods from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils.
2024-04-15Resolve static analyser report on pointer dereferencing after null check ↵mmoadeli1-18/+15
(#88278) - Resolve Static Analyzer Check Failure: Pointer Dereferencing After Null Check. - Minor naming and style improvement
2024-03-21[AMDGPU][LowerModuleLDS] Refactor partially lowered module detection (#85793)Pierre van Houtryve1-15/+25
Refactor the logic that checks if a module contains mixed absolute/non-lowered LDS GVs. The check now happens latter when the "worklists" are formed. This is because in some cases (OpenMP) we can have non-lowered GVs in a lowered module, and this is normal because those GVs are just unused and removed from the list at some point before the end of `getUsesOfLDSByFunction`. Doing the check later ensures that if a mixed module is spotted, then it's a _real_ mixed module that needs rejection, not a module containing an intentionally ignored GV.
2024-03-11[AMDGPU] Let LowerModuleLDS run twice on the same module (#81729)Pierre van Houtryve1-4/+14
If all variables in the module are absolute, this means we're running the pass again on an already lowered module, and that works. If none of them are absolute, lowering can proceed as usual. Only diagnose cases where we have a mix of absolute/non-absolute GVs, which means we added LDS GVs after lowering, which is broken. See #81491 Split from #75333
2024-01-10AMDGPU: Drop amdgpu-no-lds-kernel-id attribute in LDS lowering (#71481)Matt Arsenault1-0/+52
This is in preparation for moving the run of AMDGPUAttributor earlier. Currently it infers the lack of the corresponding intrinsic calls, so if we introduce new ones we need to remove the attribute from any possible transitive callers. This is more conservative than necessary, we could try to identify specific subgraphs where LDS globals are not used. Other options include teaching the attributor to avoid adding it in cases where the lowering may choose the table, but this seems more complex. Alternatively could add a second run which doesn't seem worth it. Depends #71349
2023-12-03[llvm] Stop including tuple (NFC)Kazu Hirata1-1/+0
Identified with clangd.
2023-11-10[llvm] Stop including llvm/ADT/SetVector.h (NFC)Kazu Hirata1-1/+0
Identified with clangd.
2023-11-07[NFC] Remove Type::getInt8PtrTy (#71029)Paulo Matos1-1/+1
Replace this with PointerType::getUnqual(). Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-09-11[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilderJeremy Morse1-1/+2
This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468
2023-09-02AMDGPU: Pass in TargetMachine to AMDGPULowerModuleLDSPassMatt Arsenault1-16/+46
https://reviews.llvm.org/D157660
2023-09-02AMDGPU: Use poison instead of undef in module lds passMatt Arsenault1-4/+4
2023-07-19[NFC][AMDGPULowerModuleLDSPass] Use shorter APIs in markUsedByKernelJuan Manuel MARTINEZ CAAMAÑO1-19/+9
* Use shorter versions of the LLVM API Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D155589
2023-07-19[NFC][AMDGPULowerModuleLDSPass] Cleanup of getTableLookupKernelIndexJuan Manuel MARTINEZ CAAMAÑO1-10/+6
* Do a single lookup when querying the map * Use shorter versions of the LLVM API Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D155588
2023-07-15[amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in ↵Jon Chesterfield1-2/+18
PromoteAlloca
2023-07-14[amdgpu] Delete elide-module-lds attributeJon Chesterfield1-16/+2
Requires D155190 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155238
2023-07-13[amdgpu][lds] Remove recalculation of LDS frame from backendJon Chesterfield1-9/+11
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend. Prior to this patch: The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol metadata so that the assembler can build lookup tables out of it. There is a fragile association between kernel functions and named structs which is used to recompute the frame layout in the backend, with fatal_errors catching inconsistencies in the second calculation. After this patch: The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same absolute_symbol metadata that the assembler uses to place objects within that frame size. Deleted the now dead allocation code from the backend. Left for a later cleanup: - enabling lowering for anonymous functions - removing the elide-module-lds attribute (test churn, it's not used by llc any more) - adjusting the dynamic alignment check to not use symbol names Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155190
2023-07-13[amdgpu][lds] Raise an explicit unimplemented error on absolute address LDS ↵Jon Chesterfield1-0/+5
variables These aren't implemented. They could be at moderate implementation complexity. Raising an error is better than silently miscompiling. Patching now because the patch at D155125 is a step towards using this metadata more extensively as part of the lowering path and that will interact badly with input variables with this annotation. Lowering user defined variables at specific addresses would drop this error, put them at the requested position in the frame during this pass, and then use the same codegen that will be used for the kernel specific struct shortly. Reviewed By: jmmartinez Differential Revision: https://reviews.llvm.org/D155132
2023-07-12[NFC][AMDGPULowerModuleLDSPass] Fix buildbot santizier failed to compileJuan Manuel MARTINEZ CAAMAÑO1-5/+6
It seems that the sanitizer-x86_64-linux-android wasn't able to deduce the template argument: AMDGPULowerModuleLDSPass.cpp:1192:53: error: no viable constructor or deduction guide for deduction of template arguments of 'vector' auto TableLookupVariablesOrdered = sortByName(std::vector( This patch makes the template argument explicit.
2023-07-12Reland "[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code"Juan Manuel MARTINEZ CAAMAÑO1-25/+17
Fixed compilation error and reudndant copy warning Differential Revision: https://reviews.llvm.org/D154977
2023-07-12[amdgpu][lds] Fix missing markUsedByKernel calls and undef lookup table elementsJon Chesterfield1-10/+16
More robust association between the kernels and lds struct. Use poison instead of value() for lookup table elements introduced by dynamic lds lowering. Extracted from D154946, new test from there verbatim. Segv fixed. Fixes issues/63338 Fixes SWDEV-404491 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154972
2023-07-11Revert "[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code"Juan Manuel MARTINEZ CAAMAÑO1-16/+24
This reverts commit 125b90749a98d6dc6b492883c9617f9e91ab60e0.
2023-07-11[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort codeJuan Manuel MARTINEZ CAAMAÑO1-24/+16
Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154970
2023-07-11[NFC][AMDGPULowerModuleLDSPass] Add const to some variables/parametersJuan Manuel MARTINEZ CAAMAÑO1-11/+13
Moving out some changes not related to the bugfix in https://reviews.llvm.org/D154946 Reviewed By: JonChesterfield, arsenm Differential Revision: https://reviews.llvm.org/D154959
2023-07-11[NFC][AMDGPULowerModuleLDSPass] Remove dead variableJuan Manuel MARTINEZ CAAMAÑO1-1/+0
2023-04-11[amdgpu][nfc] Update comments on LDS loweringJon Chesterfield1-7/+68