aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
AgeCommit message (Collapse)AuthorFilesLines
6 days[AMDGPU] Update comments in memory legalizer. NFC (#160453)Stanislav Mekhanoshin1-5/+14
6 days[AMDGPU] SIMemoryLegalizer: Factor out check if memory operations can affect ↵Fabian Ritter1-18/+31
the global AS (#160129) Mostly NFC, and adds an assertion for gfx12 to ensure that no atomic scratch instructions are present in the case of GloballyAddressableScratch. This should always hold because of #154710.
7 days[AMDGPU] Insert waitcnt for non-global fence release in GFX12 (#159282)Fabian Ritter1-38/+38
A fence release could be followed by a barrier, so it should wait for the relevant memory accesses to complete, even if it is mmra-limited to LDS. So far, that would be skipped for non-global fence releases. Fixes SWDEV-554932.
2025-09-12[NFC][AMDGPU][SIMemoryLegalizer] remove effectively empty function (#156806)Sameer Sahasrabuddhe1-39/+0
The removed function SIGfx90ACacheControl::enableLoadCacheBypass() does not actually do anything except one assert and one unreachable.
2025-09-10[AMDGPU][gfx1250] Support "cluster" syncscope (#157641)Pierre van Houtryve1-15/+36
Defaults to "agent" for targets that do not support it. - Add documentation - Register it in MachineModuleInfo - Add MemoryLegalizer support
2025-09-10[AMDGPU][gfx1250] Remove SCOPE_SE for scratch stores (#157640)Pierre van Houtryve1-5/+0
2025-09-10Revert "[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588)" (#157639)Pierre van Houtryve1-3/+1
This reverts commit be17791f2624f22b3ed24a2539406164a379125d. This is not necessary for gfx1250 anymore.
2025-09-10[AMDGPU][gfx1250] Implement SIMemoryLegalizer (#154726)Pierre van Houtryve1-19/+56
Implements the base of the MemoryLegalizer for a roughly correct GFX1250 memory model. Documentation will come later, and some remaining changes still have to be added, but this is the backbone of the model.
2025-09-04[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)Pierre van Houtryve1-3/+41
- Add clang built-ins + sema/codegen - Add IR Intrinsic + verifier - Add DAG/GlobalISel codegen for the intrinsics - Add lowering in SIMemoryLegalizer using a MMO flag.
2025-09-02[AMDGPU] Reenable BackOffBarrier on GFX11/12 (#155370)Pierre van Houtryve1-1/+30
Re-enable it by adding a wait on vm_vsrc before every barrier "start" instruction in GFX10/11/12 CU mode. This is a less strong wait than what we do without BackOffBarrier, thus this shouldn't introduce any new guarantees that can be abused, instead it relaxes the guarantees we have now to the bare minimum needed to support the behavior users want (fence release + barrier works). There is an exact memory model in the works which will be documented separately.
2025-07-30[AMDGPU] introduce S_WAITCNT_LDS_DIRECT in the memory legalizer (#150887)Sameer Sahasrabuddhe1-0/+20
The new instruction represents the unknown number of waitcnts needed at a release operation to ensure that prior direct loads to LDS (formerly called LDS DMA) are completed. The instruction is replaced in SIInsertWaitcnts with a suitable value for vmcnt(). Co-authored-by: Austin Kerbow <austin.kerbow@amd.com>.
2025-07-29[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588)Pierre van Houtryve1-1/+3
Determines whether we can use `SCOPE_CU` stores (on by default), or whether all stores must be done at `SCOPE_SE` minimum.
2025-07-28[AMDGPU][gfx12] Clean-up implementation of waits before SCOPE_SYS stores ↵Pierre van Houtryve1-13/+9
(#150587) We can do it all in finalizeStore if we ensure it always sees the stores. For that, I needed to fix a hidden bug where finalizeStore wouldn't see all stores because sometimes the iterator got out-of-sync and didn't point to the store anymore. This also removes the waits before volatile LDS stores which never needed it, that was a bug until now.
2025-07-28[AMDGPU][gfx1250] Use SCOPE_SE for stores that may hit scratch (#150586)Pierre van Houtryve1-7/+24
2025-07-24[NFC][AMDGPU] Refactor handling of `amdgpu-synchronize-as` MD on fences ↵Pierre van Houtryve1-13/+19
(#148630) Directly plug it into the MMO instead, which is much cleaner.
2025-07-24[NFC][AMDGPU] Rename "amdgpu-as" to "amdgpu-synchronize-as" (#148627)Pierre van Houtryve1-2/+2
"amdgpu-as" is way too vague and doesn't give enough context. We may want to support it on normal atomics too, to control the synchronized (ordered) AS. If we do that, the name has to be less vague.
2025-06-20[AMDGPU] Don't insert wait instructions that are not supported by gfx1250 ↵Stanislav Mekhanoshin1-2/+4
(#145084) No tests yet, but it will allow further tests not to be polluted with these waits.
2025-05-28Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)Justin Bogner1-4/+4
This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.
2025-03-12[AMDGPU][NPM] Port SIMemoryLegalizer to NPM (#130060)Akshat Oke1-10/+33
2025-02-19[AMDGPU] Remove FeatureForceStoreSC0SC1 (#126878)Fabian Ritter1-20/+0
This was only used for gfx940 and gfx941, which have since been removed. For SWDEV-512631
2025-02-19[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763)Fabian Ritter1-1/+0
gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631
2024-10-07[AMDGPU] Only emit SCOPE_SYS global_wb (#110636)Pierre van Houtryve1-29/+7
global_wb with scopes lower than SCOPE_SYS is unnecessary for correctness. I was initially optimistic they would be very cheap no-ops but they can actually be quite expensive so let's avoid them.
2024-09-09[AMDGPU] Document & Finalize GFX12 Memory Model (#98599)Pierre van Houtryve1-87/+98
Documents the memory model implemented as of #98591, with some fixes/optimizations to the implementation.
2024-07-22AMDGPU: Query MachineModuleInfo from PM instead of MachineFunction (#99679)Matt Arsenault1-6/+8
2024-07-16[AMDGPU] Fix and add namespace closing comments. NFC.Jay Foad1-1/+1
2024-07-16[AMDGPU] Implement GFX12 Memory Model (#98591)Pierre van Houtryve1-0/+129
- Emit GLOBAL_WB instructions - Reflect synscope on instructions's `scope:` operand Fixes SWDEV-468508 Fixes SWDEV-470735 Fixes SWDEV-468392 Fixes SWDEV-469622
2024-05-27[AMDGPU] Add amdgpu-as MMRA for fences (#78572)Pierre van Houtryve1-8/+56
Using MMRAs, allow `builtin_amdgcn_fence` to emit fences that only target one or more address spaces, instead of fencing all address spaces at once. This is done through a `amdgpu-as` MMRA. Currently focused on OpenCL fences, but can very easily support more AS names and codegen on more than just fences.
2024-03-06[AMDGPU] Handle amdgpu.last.use metadata (#83816)Mirko Brkušanin1-44/+55
Convert !amdgpu.last.use metadata into MachineMemOperand for last use and handle it in SIMemoryLegalizer similar to nontemporal and volatile.
2024-03-04[AMDGPU] Fix setting nontemporal in memory legalizer (#83815)Mirko Brkušanin1-5/+5
Iterator MI can advance in insertWait() but we need original instruction to set temporal hint. Just move it before handling volatile.
2024-02-28AMDGPU/GFX12: Insert waitcnts before stores with scope_sys (#82996)Petar Avramovic1-0/+47
Insert waitcnts for loads and atomics before stores with system scope. Scope is field in instruction encoding and corresponds to desired coherence level in cache hierarchy. Intrinsic stores can set scope in cache policy operand. If volatile keyword is used on generic stores memory legalizer will set scope to system. Generic stores, by default, get lowest scope level. Waitcnts are not required if it is guaranteed that memory is cached. For example vulkan shaders can guarantee this. TODO: implement flag for frontends to give us a hint not to insert waits. Expecting vulkan flag to be implemented as vulkan:private MMRA.
2024-02-13[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450)Pierre van Houtryve1-1/+4
Fixes SWDEV-443292
2024-01-18[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)Jay Foad1-0/+180
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.
2024-01-08[AMDGPU] Add new cache flushing instructions for GFX12 (#76944)Mirko Brkušanin1-2/+68
Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>
2023-12-15[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known ↵Pierre van Houtryve1-5/+7
to be 0 already (#72830) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>
2023-05-12AMDGPU: Force sc0 and sc1 on stores for gfx940 and gfx941Konstantin Zhuravlyov1-2/+21
Differential Revision: https://reviews.llvm.org/D149986
2023-03-08[AMDGPU] Skip buffer_wbl2 before atomic fence acquireStanislav Mekhanoshin1-2/+7
Memory models for gfx90a and gfx940 do not require buffer_wbl2 before the fence for acquire ordering, but we do insert the full release. Fixes: SWDEV-386785 Differential Revision: https://reviews.llvm.org/D145524
2023-02-07[NFC][TargetParser] Remove llvm/Support/TargetParser.hArchibald Elliott1-1/+1
2022-12-17std::optional::value => operator*/operator->Fangrui Song1-4/+4
value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.
2022-12-14[AMDGPU] Stop using make_pair and make_tuple. NFC.Jay Foad1-30/+18
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828
2022-12-13[CodeGen] llvm::Optional => std::optionalFangrui Song1-20/+20
2022-12-08[llvm] Use std::nullopt instead of None in comments (NFC)Kazu Hirata1-4/+5
This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02[Target] Use std::nullopt instead of None (NFC)Kazu Hirata1-10/+10
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-09-02[AMDGPU][NFC] Fix typo in commment: replace SiMemOpInfo by SIMemOpInfoJuan Manuel MARTINEZ CAAMAÑO1-2/+2
2022-07-13[llvm] Use value instead of getValue (NFC)Kazu Hirata1-4/+4
2022-06-25Revert "Don't use Optional::hasValue (NFC)"Kazu Hirata1-4/+4
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
2022-06-25Don't use Optional::hasValue (NFC)Kazu Hirata1-4/+4
2022-06-20[llvm] Don't use Optional::getValue (NFC)Kazu Hirata1-3/+3
2022-06-10[AMDGPU] Update dlc usage for GFX11Jay Foad1-1/+112
In GFX10 dlc controlled L1 cache bypass. In GFX11 it has been repurposed to control MALL NOALLOC, and glc controls L1 as well as L0 cache bypass. Update the documentation and SIMemoryLegalizer accordingly. Set dlc for nontemporal and volatile accesses. Differential Revision: https://reviews.llvm.org/D127405
2022-03-16Cleanup codegen includesserge-sans-paille1-0/+1
This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
2022-03-14[AMDGPU] gfx940 memory modelStanislav Mekhanoshin1-0/+354
Differential Revision: https://reviews.llvm.org/D121242