diff options
author | Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> | 2024-03-11 10:06:49 -0500 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-03-11 10:06:49 -0500 |
commit | b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb (patch) | |
tree | 7c4636f4c1da61f637f12d9b5bbf68e881b5c3fd /mlir/test/Target | |
parent | 63af8584fc7ea81ef6f2176e0ada0533a3495745 (diff) | |
download | llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.zip llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.tar.gz llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.tar.bz2 |
[mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942)
On some architectures (currently gfx90a, gfx94*, and gfx10**), we can
implement an LDS barrier using compiler intrinsics instead of inline
assembly, improving optimization possibilities and decreasing the
fragility of the underlying code.
Other AMDGPU chipsets continue to require inline assembly to implement
this barrier, as, by the default, the LLVM backend will insert waits on
global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure
memory watchpoints set by debuggers work correctly.
Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff
between debugability and performance. The documentation, as well as the
generated inline assembly, have been updated to explicitly call
attention to this fact.
For chipsets that did not require the inline assembly hack, we move to
the s.waitcnt and s.barrier intrinsics, which have been added to the
ROCDL dialect. The magic constants used as an argument to the waitcnt
intrinsic can be derived from
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
Diffstat (limited to 'mlir/test/Target')
-rw-r--r-- | mlir/test/Target/LLVMIR/rocdl.mlir | 16 |
1 files changed, 16 insertions, 0 deletions
diff --git a/mlir/test/Target/LLVMIR/rocdl.mlir b/mlir/test/Target/LLVMIR/rocdl.mlir index 3ea6292..d35acb0 100644 --- a/mlir/test/Target/LLVMIR/rocdl.mlir +++ b/mlir/test/Target/LLVMIR/rocdl.mlir @@ -88,7 +88,23 @@ llvm.func @rocdl.bpermute(%src : i32) -> i32 { llvm.return %0 : i32 } +llvm.func @rocdl.waitcnt() { + // CHECK-LABEL: rocdl.waitcnt + // CHECK-NEXT: call void @llvm.amdgcn.s.waitcnt(i32 0) + rocdl.waitcnt 0 + llvm.return +} + +llvm.func @rocdl.s.barrier() { + // CHECK-LABEL: rocdl.s.barrier + // CHECK-NEXT: call void @llvm.amdgcn.s.barrier() + rocdl.s.barrier + llvm.return +} + + llvm.func @rocdl.barrier() { + // CHECK-LABEL: rocdl.barrier // CHECK: fence syncscope("workgroup") release // CHECK-NEXT: call void @llvm.amdgcn.s.barrier() // CHECK-NEXT: fence syncscope("workgroup") acquire |