aboutsummaryrefslogtreecommitdiff
path: root/mlir/test/Target
diff options
context:
space:
mode:
authorKrzysztof Drewniak <Krzysztof.Drewniak@amd.com>2024-03-11 10:06:49 -0500
committerGitHub <noreply@github.com>2024-03-11 10:06:49 -0500
commitb05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb (patch)
tree7c4636f4c1da61f637f12d9b5bbf68e881b5c3fd /mlir/test/Target
parent63af8584fc7ea81ef6f2176e0ada0533a3495745 (diff)
downloadllvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.zip
llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.tar.gz
llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.tar.bz2
[mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942)
On some architectures (currently gfx90a, gfx94*, and gfx10**), we can implement an LDS barrier using compiler intrinsics instead of inline assembly, improving optimization possibilities and decreasing the fragility of the underlying code. Other AMDGPU chipsets continue to require inline assembly to implement this barrier, as, by the default, the LLVM backend will insert waits on global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure memory watchpoints set by debuggers work correctly. Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff between debugability and performance. The documentation, as well as the generated inline assembly, have been updated to explicitly call attention to this fact. For chipsets that did not require the inline assembly hack, we move to the s.waitcnt and s.barrier intrinsics, which have been added to the ROCDL dialect. The magic constants used as an argument to the waitcnt intrinsic can be derived from llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
Diffstat (limited to 'mlir/test/Target')
-rw-r--r--mlir/test/Target/LLVMIR/rocdl.mlir16
1 files changed, 16 insertions, 0 deletions
diff --git a/mlir/test/Target/LLVMIR/rocdl.mlir b/mlir/test/Target/LLVMIR/rocdl.mlir
index 3ea6292..d35acb0 100644
--- a/mlir/test/Target/LLVMIR/rocdl.mlir
+++ b/mlir/test/Target/LLVMIR/rocdl.mlir
@@ -88,7 +88,23 @@ llvm.func @rocdl.bpermute(%src : i32) -> i32 {
llvm.return %0 : i32
}
+llvm.func @rocdl.waitcnt() {
+ // CHECK-LABEL: rocdl.waitcnt
+ // CHECK-NEXT: call void @llvm.amdgcn.s.waitcnt(i32 0)
+ rocdl.waitcnt 0
+ llvm.return
+}
+
+llvm.func @rocdl.s.barrier() {
+ // CHECK-LABEL: rocdl.s.barrier
+ // CHECK-NEXT: call void @llvm.amdgcn.s.barrier()
+ rocdl.s.barrier
+ llvm.return
+}
+
+
llvm.func @rocdl.barrier() {
+ // CHECK-LABEL: rocdl.barrier
// CHECK: fence syncscope("workgroup") release
// CHECK-NEXT: call void @llvm.amdgcn.s.barrier()
// CHECK-NEXT: fence syncscope("workgroup") acquire