aboutsummaryrefslogtreecommitdiff
path: root/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
AgeCommit message (Collapse)AuthorFilesLines
20 hours[AMDGPU][Attributor] Stop inferring amdgpu-no-flat-scratch-init in sanitized ↵Chaitanya1-2/+2
functions. (#161319) This PR stops the attributor pass to infer `amdgpu-no-flat-scratch-init` for functions marked with `sanitize_*` attribute.
2025-09-15[AMDGPU][Attributor] Add `AAAMDGPUClusterDims` (#158076)Shilei Tian1-20/+20
2025-08-27[AMDGPU][Attributor] Remove final update of waves-per-eu after the ↵Shilei Tian1-44/+43
attributor run (#155246) We do not need this in the attributor, because `ST.getWavesPerEU` accounts for both the waves-per-eu and flat-workgroup-size attributes. If the waves-per-eu values are not valid, it drops them. In the attributor, we only need to propagate the values without using intermediate flat workgroup size values. Fixes SWDEV-550257.
2025-05-17[AMDGPU][Attributor] Rework update of `AAAMDWavesPerEU` (#123995)Shilei Tian1-36/+34
Currently, we use `AAAMDWavesPerEU` to iteratively update values based on attributes from the associated function, potentially propagating user-annotated values, along with `AAAMDFlatWorkGroupSize`. Similarly, we have `AAAMDFlatWorkGroupSize`. However, since the value calculated through the flat workgroup size always dominates the user annotation (i.e., the attribute), running `AAAMDWavesPerEU` iteratively is unnecessary if no user-annotated value exists. This PR completely rewrites how the `amdgpu-waves-per-eu` attribute is handled in `AMDGPUAttributor`. The key changes are as follows: - `AAAMDFlatWorkGroupSize` remains unchanged. - `AAAMDWavesPerEU` now only propagates user-annotated values. - A new function is added to check and update `amdgpu-waves-per-eu` based on the following rules: - No waves per eu, no flat workgroup size: Assume a flat workgroup size of `1,1024` and compute waves per eu based on this. - No waves per eu, flat workgroup size exists: Use the provided flat workgroup size to compute waves-per-eu. - Waves per eu exists, no flat workgroup size: This is a tricky case. In this PR, we assume a flat workgroup size of `1,1024`, but this can be adjusted if a different approach is preferred. Alternatively, we could directly use the user-annotated value. - Both waves per eu and flat workgroup size exist: If there’s a conflict, the value derived from the flat workgroup size takes precedence over waves per eu. This PR also updates the logic for merging two waves per eu pairs. The current implementation, which uses `clampStateAndIndicateChange` to compute a union, might not be ideal. If we think from ensure proper resource allocation perspective, for instance, if one pair specifies a minimum of 2 waves per eu, and another specifies a minimum of 4, we should guarantee that 4 waves per eu can be supported, as failing to do so could result in excessive resource allocation per wave. A similar principle applies to the upper bound. Thus, the PR uses the following approach for merging two pairs, `lo_a,up_a` and `lo_b,up_b`: `max(lo_a, lo_b), max(up_a, up_b)`. This ensures that resource allocation adheres to the stricter constraints from both inputs. Fix #123092.
2025-04-24AMDGPU: Remove amdhsa_code_object_version module flags from most tests (#136363)Matt Arsenault1-6/+0
These were added to the migration from v4 to v5 and should be removed now that the default has changed.
2025-04-15[AMDGPU] Remove the AnnotateKernelFeatures pass (#130198)Jun Wang1-331/+0
Previously the AnnotateKernelFeatures pass infers two attributes: amdgpu-calls and amdgpu-stack-objects, which are used to help determine if flat scratch init is allowed. PR #118907 created the amdgpu-no-flat-scratch-init attribute. Continuing with that work, this patch makes use of this attribute to determine flat scratch init, replacing amdgpu-calls and amdgpu-stack-objects. This also leads to the removal of the AnnotateKernelFeatures pass.
2025-03-13AMDGPU: Replace ptr addrspace(1) undefs with poison (#130900)Matt Arsenault1-48/+48
Many tests use store to undef as a placeholder use, so just replace all of these with poison.
2025-03-06AMDGPU: Replace amdgpu-no-agpr with amdgpu-agpr-alloc (#129893)Matt Arsenault1-21/+21
This performs the minimal replacment of amdgpu-no-agpr to amdgpu-agpr-alloc=0. Most of the test diffs are due to the new attribute sorting later alphabetically. We could do better by trying to perform range merging in the attributor, and trying to pick non-0 values.
2024-12-11[AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor existing attribute (#114438)Shilei Tian1-22/+24
2024-12-11[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute ↵Shilei Tian1-54/+49
(#114357) If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by taking its value directly; otherwise, it uses the default range as a starting point. We will no longer manipulate the known range, which can cause issues because the known range is a "throttle" to the assumed range such that the assumed range can't get widened properly in `updateImpl` if the known range is not set properly for whatever reasons. Another benefit of not touching the known range is, if we indicate pessimistic state, it also invalidates the AA such that `manifest` will not be called. Since we honor the attribute, we don't want and will not add any half-baked attribute added to a function.
2024-12-09Reapply "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in ↵Jun Wang1-26/+27
AMDGPUAttributor (#94647)" (#118907) This reverts commit 1ef9410a96c1d9669a6feaf03fcab8d0a4a13bd5. This fixes the test file attributor-flatscratchinit-globalisel.ll.
2024-12-04Revert "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in ↵Philip Reames1-27/+26
AMDGPUAttributor (#94647)" This reverts commit e6aec2c12095cc7debd1a8004c8535eef41f4c36. Commit breaks "ninja check-llvm" on x86 host.
2024-12-04[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor ↵Jun Wang1-26/+27
(#94647) The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and "amdgpu-stack-objects" attributes, which are used to infer whether we need to initialize flat scratch. This is, however, not precise. Instead, we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on kernels. Refer to https://github.com/llvm/llvm-project/issues/63586 .
2024-03-21AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948)Matt Arsenault1-22/+22
SIMachineFunctionInfo has a scan of the function body for inline asm which may use AGPRs, or callees in SIMachineFunctionInfo. Move this into the attributor, so it actually works interprocedurally. Could probably avoid most of the test churn if this bothered to avoid adding this on subtargets without AGPRs. We should also probably try to delete the MIR scan in usesAGPRs but it seems to be trickier to eliminate.
2024-03-06[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)Emma Pilkington1-3/+3
The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb.
2023-12-12[AMDGPU][NFC] Test autogenerated llc tests for COV5 (#74339)Saiyedul Islam1-12/+19
Regenerate a few llc tests to test for COV5 instead of the default ABI version.
2023-11-07AMDGPU: Port AMDGPUAttributor to new pass manager (#71349)Matt Arsenault1-1/+1
2023-10-26[opt] Infer DataLayout from triple if not specifiedAlex Richardson1-2/+2
There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060
2023-09-12Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" ↵Saiyedul Islam1-12/+12
(#66060) This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
2023-09-12[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)Saiyedul Islam1-12/+12
Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818
2023-06-16AMDGPU: Propagate amdgpu-waves-per-eu with attributorMatt Arsenault1-49/+53
This will do a value range merging down the callgraph, unlike the current pass which can only propagate values to undecorated functions from a kernel. This one is a bit weird due to the interaction with the implied range from amdgpu-flat-workgroup-size. At the default group range of 1,1024, the minimum implied bounds is 4 so this ends up introducing the attribute on undecorated functions. We could probably simplify this by ignoring it and propagating the raw values. The subtarget interaction and the interaction with amdgpu-flat-workgroup-size only really clamp invalid values (plus the lower bound doesn't seem to do anything as far as I can tell anyway).
2023-01-12Partially reapply "AMDGPU: Invert handling of enqueued block detection"Matt Arsenault1-19/+109
This mostly reverts commit 270e96f435596449002fc89962595497481c8770. Keep the attributor related changes around, but functionally restore the old behavior as a workaround. Device enqueue goes back to not working at -O0 with this version.
2023-01-07Revert "AMDGPU: Invert handling of enqueued block detection"Matt Arsenault1-109/+19
This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e. The runtime is having trouble with this at -O0 when the inputs are always enabled.
2023-01-06AMDGPU: Invert handling of enqueued block detectionMatt Arsenault1-19/+109
Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set amdgpu-no-default-queue and amdgpu-no-completion-action up front so they never have to pay the cost. There are also so many of these now, the offset use API should probably consider all of them at once. Maybe they should merge into one attribute with used fields. Having separate functions for each field in AMDGPUBaseInfo is also not the greatest API (might as well fix this when the patch to get the object version from the module lands).
2022-12-19AMDGPU: Update some tests to use opaque pointersMatt Arsenault1-121/+121
vectorize-buffer-fat-pointer.ll required a manual check line fix. vector-alloca-addrspacecast.ll required a manual fixup of a check line. partial-regcopy-and-spill-missed-at-regalloc.ll required re-running update_mir_test_checks. The HSA metadata tests required avoiding the script touching the type name in the metadata. annotate-noclobber.ll ran into one update script bug. It deleted a check line with a 0 offset GEP, moving the following -NEXT check logically up one line.
2022-12-07[AMDGPU] Annotate the intrinsics to be default and nocallbackJohannes Doerfert1-2/+2
Differential Revision: https://reviews.llvm.org/D135155
2022-11-29Revert "enable code-object-version=5"Ron Lieberman1-11/+11
very sorry wrong repo. This reverts commit d882ba7aeac4b496dccd1b10cb58bd691786b691.
2022-11-29enable code-object-version=5Ron Lieberman1-11/+11
2022-11-04[IR] Switch everything to use memory attributeNikita Popov1-2/+2
This switches everything to use the memory attribute proposed in https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579. The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly attributes are dropped. The readnone, readonly and writeonly attributes are restricted to parameters only. The old attributes are auto-upgraded both in bitcode and IR. The bitcode upgrade is a policy requirement that has to be retained indefinitely. The IR upgrade is mainly there so it's not necessary to update all tests using memory attributes in this patch, which is already large enough. We could drop that part after migrating tests, or retain it longer term, to make it easier to import IR from older LLVM versions. High-level Function/CallBase APIs like doesNotAccessMemory() or setDoesNotAccessMemory() are mapped transparently to the memory attribute. Code that directly manipulates attributes (e.g. via AttributeList) on the other hand needs to switch to working with the memory attribute instead. Differential Revision: https://reviews.llvm.org/D135780
2022-07-19[amdgpu] Implement lds kernel id intrinsicJon Chesterfield1-17/+17
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060
2022-04-12AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionallyChangpeng Fang1-17/+17
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function. Reviewers: arsenm, sameerds, b-sumner and foad Differential Revision: https://reviews.llvm.org/D123548
2022-02-25[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernargChangpeng Fang1-17/+17
Summary: Emit metadata for hidden_heap_v1 kernarg Reviewers: sameerds, b-sumner Fixes: SWDEV-307188 Differential Revision: https://reviews.llvm.org/D119027
2022-02-11[AMDGPU] replace hostcall module flag with function attributeSameer Sahasrabuddhe1-19/+19
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216
2021-12-02AMDGPU: Sanitized functions require implicit argumentsMatt Arsenault1-2/+92
Do not infer no-amdgpu-implicitarg-ptr for sanitized functions. If a function is explicitly marked amdgpu-no-implicitarg-ptr and sanitize_address, infer that it is required.
2021-09-09AMDGPU: Use attributor to propagate uniform-work-group-sizeMatt Arsenault1-23/+21
Drop the legacy version in AMDGPUAnnotateKernelFeatures. This has the side effect of now respecting the linkage, and not changing externally visible functions.
2021-09-09AMDGPU: Invert ABI attribute handlingMatt Arsenault1-67/+53
Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated. This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass. However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit. At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.
2021-08-26AMDGPU: Invert AMDGPUAttributorMatt Arsenault1-21/+20
Switch to using BitIntegerState for each of the inputs, and invert their meanings. This now diverges more from the old AMDGPUAnnotateKernelFeatures, but this isn't used yet anyway.
2021-08-26AMDGPU: Restrict attributor transformsMatt Arsenault1-99/+130
We only really want this to add the custom attributes. Theoretically the regular transforms were already run at this point. Touching undefined behavior breaks a lot of tests when this is enabled by default, many of which are expecting to test handling of undef operations.
2021-08-26AMDGPU: Remove hacky attribute deduction from AMDGPUAttributorMatt Arsenault1-32/+31
amdgpu-calls and amdgpu-stack-objects don't really belong as attributes, and are currently a hacky way of passing an analysis into the DAG. These don't really belong in the IR, and don't really fit in with the other attributes. Remove these to facilitate inverting the pass. I don't exactly understand the indirect call test changes. These tests are using calls which are trivially replacable with a direct call, so I'm not sure what the point is.
2021-08-26AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptrMatt Arsenault1-25/+24
We no longer use this intrinsic outside of the backend and no longer support using it outside of kernels.
2021-08-13AMDGPU: Stop attributor adding attributes to intrinsic declarationsMatt Arsenault1-1/+1
2021-08-13AMDGPU: Add indirect and extern calls to attributor testMatt Arsenault1-0/+77
2021-08-12[Attributor] Do not delete volatile stores to null/undefJohannes Doerfert1-0/+16
See D106309. Differential Revision: https://reviews.llvm.org/D107906
2021-07-27[Attributor][FIX] Update AMDGPU attributor testJohannes Doerfert1-147/+173
The test contains UB and should be improved, for now we update the check lines pass it.
2021-07-24[AMDGPU] Deduce attributes with the AttributorKuter Dinel1-169/+367
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes. Reviewed By: jdoerfert, arsenm Differential Revision: https://reviews.llvm.org/D104997
2021-07-15[AMDGPU] Use update_test_checks.py script for annotate kernel features tests.Kuter Dinel1-67/+271
This patch makes the annotate kernel features tests use the update_tests_checks.py script. Which makes it easy to update the tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D105864
2021-04-20[AMDGPU] Remove error check for indirect calls and add missing queue-ptrmadhur134901-1/+1
This patch removes -fixed-abi check for indirect calls and also adds queue-ptr which is required for indirect calls to work. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100633
2021-04-13[AMDGPU] Set implicit arg attributes for indirect callsmadhur134901-1/+1
This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken. Once such attributes are set, rest of the codegen would work out-of-box for indirect calls. This patch eliminates the potential overhead -fixed-abi imposes even though indirect functions calls are not used. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D99347
2020-06-18AMDGPU: Add IntrWillReturn to intrinsic definitionsMatt Arsenault1-1/+1
This should probably be implied for all the speculatable ones. I think the only ones where this plausibly doesn't apply is s_sendmsghalt and maybe kill.
2020-04-04AMDGPU: Fix annotate kernel features through casted callsMatt Arsenault1-0/+15
I thought I was testing this before, but the workitem id x case isn't great since it's mandatory in the parent kernel.