aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-08-19[AMDGPU] upstream barrier count reporting part1 (#154409)Gang Chen1-0/+2
2024-11-13[AMDGPU] Remove unused includes (NFC) (#116154)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2024-11-06Remove unused variable to fix '[AMDGPU] modify named barrier builtins and ↵Thurston Dang1-1/+1
intrinsics (#114550)' https://github.com/llvm/llvm-project/pull/114550 caused a buildbot breakage (https://lab.llvm.org/buildbot/#/builders/66/builds/5853) because of an unused variable. This patch attempts to fix forward: /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp:106:24: error: variable 'TTy' set but not used [-Werror,-Wunused-but-set-variable] 106 | if (TargetExtType *TTy = AMDGPU::isNamedBarrier(GV)) { | ^
2024-11-06[AMDGPU] modify named barrier builtins and intrinsics (#114550)Gang Chen1-0/+8
Use a local pointer type to represent the named barrier in builtin and intrinsic. This makes the definitions more user friendly bacause they do not need to worry about the hardware ID assignment. Also this approach is more like the other popular GPU programming language. Named barriers should be represented as global variables of addrspace(3) in LLVM-IR. Compiler assigns the special LDS offsets for those variables during AMDGPULowerModuleLDS pass. Those addresses are converted to hw barrier ID during instruction selection. The rest of the instruction-selection changes are primarily due to the intrinsic-definition changes.
2024-10-03[AMDGPU] Qualify auto. NFC. (#110878)Jay Foad1-1/+1
Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)
2024-07-16[AMDGPU] Use member initializers. NFC.Jay Foad1-2/+1
2024-01-04[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 (#65273)Chaitanya1-9/+30
"hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout. Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a function uses dynamic LDS. hidden argument will be added in below cases: - LDS global is used in the kernel. - Kernel calls a function which uses LDS global. - LDS pointer is passed as argument to kernel itself.
2023-08-21[AMDGPU] Add IsChainFunction to the MachineFunctionInfoDiana Picus1-0/+1
This will represent functions with the amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions. Differential Revision: https://reviews.llvm.org/D156410
2023-07-15[amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in ↵Jon Chesterfield1-1/+8
PromoteAlloca
2023-07-13[amdgpu][lds] Remove recalculation of LDS frame from backendJon Chesterfield1-77/+42
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend. Prior to this patch: The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol metadata so that the assembler can build lookup tables out of it. There is a fragile association between kernel functions and named structs which is used to recompute the frame layout in the backend, with fatal_errors catching inconsistencies in the second calculation. After this patch: The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same absolute_symbol metadata that the assembler uses to place objects within that frame size. Deleted the now dead allocation code from the backend. Left for a later cleanup: - enabling lowering for anonymous functions - removing the elide-module-lds attribute (test churn, it's not used by llc any more) - adjusting the dynamic alignment check to not use symbol names Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155190
2023-04-04[amdgpu] Implement dynamic LDS accesses from non-kernel functionsJon Chesterfield1-1/+39
The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so. 1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel 2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables 3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen 4/ The assembler builds the lookup table using the metadata 5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144233
2023-03-30[amdgpu] Fix broken error detection in LDS loweringJon Chesterfield1-2/+2
std::optional<uint32_t> can be compared to uint32_t without warning, but does not compare to the value within the optional. It needs to be prefixed *. Wconversion does not warn about this. ``` bool bug(uint32_t Offset, std::optional<uint32_t> Expect) { return (Offset != Expect); } bool deref(uint32_t Offset, std::optional<uint32_t> Expect) { return (Offset != *Expect); } ``` Both compile without warnings. Wrote the former, intended the latter. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D146775
2023-03-12[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MDJon Chesterfield1-64/+37
Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently used to build assembler tables from their addresses. This is a precursor to lowering dynamic/external LDS accesses from non-kernel functions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144221
2022-12-21CodeGen: Don't lazily construct MachineFunctionInfoMatt Arsenault1-7/+10
This fixes what I consider to be an API flaw I've tripped over multiple times. The point this is constructed isn't well defined, so depending on where this is first called, you can conclude different information based on the MachineFunction. For example, the AMDGPU implementation inspected the MachineFrameInfo on construction for the stack objects and if the frame has calls. This kind of worked in SelectionDAG which visited all allocas up front, but broke in GlobalISel which hasn't visited any of the IR when arguments are lowered. I've run into similar problems before with the MIR parser and trying to make use of other MachineFunction fields, so I think it's best to just categorically disallow dependency on the MachineFunction state in the constructor and to always construct this at the same time as the MachineFunction itself. A missing feature I still could use is a way to access an custom analysis pass on the IR here.
2022-12-14[AMDGPU] Stop using make_pair and make_tuple. NFC.Jay Foad1-1/+1
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828
2022-12-13[CodeGen] llvm::Optional => std::optionalFangrui Song1-1/+1
2022-12-07[amdgpu] Reimplement LDS loweringJon Chesterfield1-2/+55
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis. Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139433
2022-12-06Revert "[amdgpu] Reimplement LDS lowering"Nico Weber1-55/+2
This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed. Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862
2022-12-06[amdgpu] Reimplement LDS loweringJon Chesterfield1-2/+55
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis. Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139433
2022-09-28[AMDGPU] Move SIModeRegisterDefaults to SI MFIStanislav Mekhanoshin1-1/+1
It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666
2022-09-28[amdgpu][nfc] Allocate kernel-specific LDS struct deterministicallyJon Chesterfield1-6/+47
A kernel may have an associated struct for laying out LDS variables. This patch puts that instance, if present, at a deterministic address by allocating it at the same time as the module scope instance. This is relatively likely to be where the instance was allocated anyway (~NFC) but will allow later patches to calculate where a given field can be found, which means a function which is only reachable from a single kernel will be able to access a LDS variable with zero overhead. That will be particularly helpful for applications that instantiate a function template containing LDS variables once per kernel. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127052
2022-09-26Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI"Vitaly Buka1-1/+1
Break msan bots. Details in D134666. This reverts commit 0ce96e06ee0226938e723bd0c8e16e3d2d51f203.
2022-09-26[AMDGPU] Move SIModeRegisterDefaults to SI MFIStanislav Mekhanoshin1-1/+1
It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666
2022-07-19[amdgpu] Implement lds kernel id intrinsicJon Chesterfield1-0/+16
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060
2022-05-04[amdgpu] Elide module lds allocation in kernels with no calleesJon Chesterfield1-2/+8
Introduces a string attribute, amdgpu-requires-module-lds, to allow eliding the module.lds block from kernels. Will allocate the block as before if the attribute is missing or has its default value of true. Patch uses the new attribute to detect the simplest possible instance of this, where a kernel makes no calls and thus cannot call any functions that use LDS. Tests updated to match, coverage was already good. Interesting cases is in lower-module-lds-offsets where annotating the kernel allows the backend to pick a different (in this case better) variable ordering than previously. A later patch will avoid moving kernel variables into module.lds when the kernel can have this attribute, allowing optimal ordering and locally unused variable elimination. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122091
2022-05-04[iwyu] Handle regressions in libLLVM header includeserge-sans-paille1-0/+1
Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D124847
2022-04-19AMDGPU: Add assert for GDS globalsMatt Arsenault1-0/+3
2022-04-19AMDGPU: Fix allocating GDS globals to LDS offsetsMatt Arsenault1-9/+27
These don't seem to be very well used or tested, but try to make the behavior a bit more consistent with LDS globals. I'm not sure what the definition for amdgpu-gds-size is supposed to mean. For now I assumed it's allocating a static size at the beginning of the allocation, and any known globals are allocated after it.
2022-03-20Revert "[amdgpu][nfc] Pass function instead of module to ↵Jon Chesterfield1-2/+1
allocateModuleLDSGlobal" Reconsidered, better to handle per-function state in the constructor as before. This reverts commit 98e474c1b3210d90e313457bf6a6e39a7edb4d2b.
2022-03-19[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobalJon Chesterfield1-1/+2
2022-01-06[Target] Remove redundant member initialization (NFC)Kazu Hirata1-3/+2
Identified with readability-redundant-member-init.
2021-05-20[AMDGPU] Fix module LDS selectionStanislav Mekhanoshin1-1/+1
Accesses to global module LDS variable start from null, but kernel also thinks its variables start address is null. Fixed by not using a null as an address. Differential Revision: https://reviews.llvm.org/D102882
2021-04-17Normalize interaction with boolean attributesSerge Guelton1-4/+2
Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299
2021-03-15[amdgpu] Implement lower function LDS passJon Chesterfield1-0/+12
[amdgpu] Implement lower function LDS pass Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kernel functions, moves them into a new struct type, and allocates an instance of that type in every kernel. Uses are then replaced with a constantexpr offset. Prior to this pass, accesses from a function are compiled to trap. With this pass, most such accesses are removed before reaching codegen. The trap logic is left unchanged by this pass. It is still reachable for the cases this pass misses, notably the extern shared construct from hip and variables marked constant which survive the optimizer. This is of interest to the openmp project because the deviceRTL runtime library uses cuda shared variables from functions that cannot be inlined. Trunk llvm therefore cannot compile some openmp kernels for amdgpu. In addition to the unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled and the function pointer hashing scheme deleted passes the openmp suite. This lowering will use more LDS than strictly necessary. It is intended to be a functionally correct fallback for cases that are difficult to target from future optimisation passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94648
2021-01-07[NFC][AMDGPU] Reduce include files dependency.dfukalov1-1/+2
Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813
2020-12-14[AMDGPU] Mark amdgpu_gfx functions as module entry functionSebastian Neubauer1-5/+7
- Allows lds allocations - Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata Differential Revision: https://reviews.llvm.org/D92946
2020-08-20[amdgpu] Add codegen support for HIP dynamic shared memory.Michael Liao1-2/+19
Summary: - HIP uses an unsized extern array `extern __shared__ T s[]` to declare the dynamic shared memory, which size is not known at the compile time. Reviewers: arsenm, yaxunl, kpyzhov, b-sumner Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82496
2020-06-29[Alignment][NFC] Migrate AMDGPU backend to AlignGuillaume Chatelet1-4/+3
This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82743
2020-06-23Remove GlobalValue::getAlignment().Eli Friedman1-1/+1
This function is deceptive at best: it doesn't return what you'd expect. If you have an arbitrary GlobalValue and you want to determine the alignment of that pointer, Value::getPointerAlignment() returns the correct value. If you want the actual declared alignment of a function or variable, GlobalObject::getAlignment() returns that. This patch switches all the users of GlobalValue::getAlignment to an appropriate alternative. Differential Revision: https://reviews.llvm.org/D80368
2020-05-19AMDGPU: Use member initializers in MFIMatt Arsenault1-6/+1
2020-04-02AMDGPU: Remove denormal subtarget featuresMatt Arsenault1-1/+1
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2019-11-19AMDGPU: Refactor treatment of denormal modeMatt Arsenault1-0/+1
Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
2019-10-15[Alignment] Migrate Attribute::getWith(Stack)AlignmentGuillaume Chatelet1-1/+0
Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Reviewed By: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68792 llvm-svn: 374884
2019-07-05AMDGPU: Make AMDGPUPerfHintAnalysis an SCC passMatt Arsenault1-7/+7
Add a string attribute instead of directly setting MachineFunctionInfo. This avoids trying to get the analysis in the MachineFunctionInfo in a way that doesn't work with the new pass manager. This will also avoid re-visiting the call graph for every single function. llvm-svn: 365241
2019-01-19Update the file headers across all of the LLVM projects in the monorepoChandler Carruth1-4/+3
to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
2018-07-20Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Matt Arsenault1-2/+9
Reverts r337079 with fix for msan error. llvm-svn: 337535
2018-07-14Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Evgeniy Stepanov1-9/+2
This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079
2018-07-13AMDGPU: Fix handling of alignment padding in DAG argument loweringMatt Arsenault1-2/+9
This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
2018-06-28AMDGPU: Remove MFI::ABIArgOffsetMatt Arsenault1-2/+1
We have too many mechanisms for tracking the various offsets used for kernel arguments, so remove one. There's still a lot of confusion with these because there are two different "implicit" argument areas located at the beginning and end of the kernarg segment. Additionally, the offset was determined based on the memory size of the split element types. This would break in a future commit where v3i32 is decomposed into separate i32 pieces. llvm-svn: 335830
2018-05-25[AMDGPU] Add perf hints to functionsStanislav Mekhanoshin1-1/+13
This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289