[mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942)

On some architectures (currently gfx90a, gfx94*, and gfx10**), we can implement an LDS barrier using compiler intrinsics instead of inline assembly, improving optimization possibilities and decreasing the fragility of the underlying code. Other AMDGPU chipsets continue to require inline assembly to implement this barrier, as, by the default, the LLVM backend will insert waits on global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure memory watchpoints set by debuggers work correctly. Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff between debugability and performance. The documentation, as well as the generated inline assembly, have been updated to explicitly call attention to this fact. For chipsets that did not require the inline assembly hack, we move to the s.waitcnt and s.barrier intrinsics, which have been added to the ROCDL dialect. The magic constants used as an argument to the waitcnt intrinsic can be derived from llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
author: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> 2024-03-11 10:06:49 -0500
committer: GitHub <noreply@github.com> 2024-03-11 10:06:49 -0500
commit: b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb (patch)
tree: 7c4636f4c1da61f637f12d9b5bbf68e881b5c3fd /mlir/test/Target
parent: 63af8584fc7ea81ef6f2176e0ada0533a3495745 (diff)
download: llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.zip
llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.tar.gz
llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.tar.bz2
1 files changed, 16 insertions, 0 deletions
diff --git a/mlir/test/Target/LLVMIR/rocdl.mlir b/mlir/test/Target/LLVMIR/rocdl.mlir
index 3ea6292..d35acb0 100644
--- a/mlir/test/Target/LLVMIR/rocdl.mlir
+++ b/mlir/test/Target/LLVMIR/rocdl.mlir
@@ -88,7 +88,23 @@ llvm.func @rocdl.bpermute(%src : i32) -> i32 {
   llvm.return %0 : i32
 }
 
+llvm.func @rocdl.waitcnt() {
+  // CHECK-LABEL: rocdl.waitcnt
+  // CHECK-NEXT: call void @llvm.amdgcn.s.waitcnt(i32 0)
+  rocdl.waitcnt 0
+  llvm.return
+}
+
+llvm.func @rocdl.s.barrier() {
+  // CHECK-LABEL: rocdl.s.barrier
+  // CHECK-NEXT: call void @llvm.amdgcn.s.barrier()
+  rocdl.s.barrier
+  llvm.return
+}
+
+
 llvm.func @rocdl.barrier() {
+  // CHECK-LABEL: rocdl.barrier
   // CHECK:      fence syncscope("workgroup") release
   // CHECK-NEXT: call void @llvm.amdgcn.s.barrier()
   // CHECK-NEXT: fence syncscope("workgroup") acquire
author	Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2024-03-11 10:06:49 -0500
committer	GitHub <noreply@github.com>	2024-03-11 10:06:49 -0500
commit	b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb (patch)
tree	7c4636f4c1da61f637f12d9b5bbf68e881b5c3fd /mlir/test/Target
parent	63af8584fc7ea81ef6f2176e0ada0533a3495745 (diff)
download	llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.zip llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.tar.gz llvm-b05c15259bcbe3eba353b77ca4fc9ec2a81dd3fb.tar.bz2