[AMDGPU] Document "relaxed buffer OOB mode", update HSA defaultusers/krzysz00/buffer-oob-mode-hsa-default

This commit adds documentation for the relaxed-buffer-oob-mode subtarget feature so that users are aware of the performance implications of the change. It also enables relaxed buffer OOB mode for HSA programs, which don't have this correctness requirement.
author: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> 2025-04-07 21:13:20 +0000
committer: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> 2025-04-07 21:17:58 +0000
commit: e0f13d7ab0a942c325cf4e1b77c897f21e4c95e0 (patch)
tree: 17f852dd7064d22b5b28bfb8a1fc1cc91a3a8a6f
parent: 9fdac840ec4901a6e3c71249a136cbecc4a9921a (diff)
download: llvm-users/krzysz00/buffer-oob-mode-hsa-default.zip
llvm-users/krzysz00/buffer-oob-mode-hsa-default.tar.gz
llvm-users/krzysz00/buffer-oob-mode-hsa-default.tar.bz2
4 files changed, 44 insertions, 3 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index d153596..9ca86aa 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1136,6 +1136,41 @@ is conservatively correct for OpenCL.
                              other operations within the same address space.
      ======================= ===================================================
 
+Relaxed Buffer OOB (Out Of Bounds) Mode
+---------------------------------------
+
+Instructions that load from or store to buffer resources (and thus, by extension
+buffer fat pointers and buffer strided pointers) generally implement handling for
+out of bounds (OOB) memory accesses, including those that are partially OOB,
+if the buffer resource resource has the required flags set.
+
+When operating on more than 32 bits of data, the `voffset` used for the access
+will be range-checked for each 32-bit word independently. This check uses saturating
+arithmetic and interprets the offset as an unsigned value.
+
+The behavior described above conflicts with the ABI requirements of certain graphics
+APIs that require out of bounds accesses to be handled strictly so that accessed
+that begin out of bounds but then access in-bounds elements (such as loading A
+``<4 x i32>`` beginning at offset ``-4``) still load the three in-bounds integers.
+
+Similarly, buffer fat pointers permit operating types such as `<8 x i8>` which
+must be accessed (and bounds-checked) 4 bytes at a time. Non-word-aligned
+accesses to such types from near the end of a buffer resource (such as starting
+a load of an ``<8xi8>`` from an offset of ``6`` on an 8-byte buffer) will treat
+the initial two bytes to be loaded/stored as out of bounds, even though, under
+a strict interpretation of the bounds-checking semantics, they would be out of bounds.
+
+These violations of strict bounds-checking semantics for buffer resources require
+usage of less-vectorized code to ensure correctness. Ifthis strict conformance
+is not required, the target feature ``relaxed-oob-buffer-mode`` should be enabled
+(using ``-mcpu``, ``-offload-arch`` or ``-mattr``).
+
+``relaxed-buffer-oob-mode`` permits unaligned memory acceses through a buffer resource
+to propagate to nearby elemennts, causing them to become out of bounds as well.
+
+``relaxed-buffer-oob-mode`` is **enabled** on HSA targets by default to preserve
+compute performance and existing ABI expectations.
+
 LLVM IR Intrinsics
 ------------------
 
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 58cf71b..411c469d 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -92,6 +92,11 @@ Changes to the AMDGPU Backend
 
 * Bump the default `.amdhsa_code_object_version` to 6. ROCm 6.3 is required to run any program compiled with COV6.
 
+* Turn on strict buffer OOB checking on non-AMDHSA OSs. This improves the correctness
+  of buffer accesses in some cases at the cost of performance for programs that do not
+  contain unaligned out-of-bounds accesses. The old behavior may be restored with the
+  `relaxed-buffer-oob-mode` feature.
+
 Changes to the ARM Backend
 --------------------------
 
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
index 53f5c1e..1bd2230 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
@@ -71,7 +71,8 @@ GCNSubtarget &GCNSubtarget::initializeSubtargetDependencies(const Triple &TT,
   // Turn on features that HSA ABI requires. Also turn on FlatForGlobal by
   // default
   if (isAmdHsaOS())
-    FullFS += "+flat-for-global,+unaligned-access-mode,+trap-handler,";
+    FullFS += "+flat-for-global,+unaligned-access-mode,+trap-handler,"
+              "+relaxed-buffer-oob-mode,";
 
   FullFS += "+enable-prt-strict-null,"; // This is overridden by a disable in FS
 
diff --git a/llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll b/llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll
index ede2e40..01239b9 100644
--- a/llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll
+++ b/llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll
@@ -1,5 +1,5 @@
-; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=load-store-vectorizer -mattr=+relaxed-buffer-oob-mode -S -o - %s | FileCheck --check-prefixes=CHECK,CHECK-OOB-RELAXED %s
-; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=load-store-vectorizer -S -o - %s | FileCheck --check-prefixes=CHECK,CHECK-OOB-STRICT %s
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=load-store-vectorizer -S -o - %s | FileCheck --check-prefixes=CHECK,CHECK-OOB-RELAXED %s
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=load-store-vectorizer -mattr=-relaxed-buffer-oob-mode -S -o - %s | FileCheck --check-prefixes=CHECK,CHECK-OOB-STRICT %s
 
 target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-ni:7"
author	Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2025-04-07 21:13:20 +0000
committer	Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2025-04-07 21:17:58 +0000
commit	e0f13d7ab0a942c325cf4e1b77c897f21e4c95e0 (patch)
tree	17f852dd7064d22b5b28bfb8a1fc1cc91a3a8a6f
parent	9fdac840ec4901a6e3c71249a136cbecc4a9921a (diff)
download	llvm-users/krzysz00/buffer-oob-mode-hsa-default.zip llvm-users/krzysz00/buffer-oob-mode-hsa-default.tar.gz llvm-users/krzysz00/buffer-oob-mode-hsa-default.tar.bz2