aboutsummaryrefslogtreecommitdiff
path: root/llvm/docs/AMDGPUUsage.rst
diff options
context:
space:
mode:
Diffstat (limited to 'llvm/docs/AMDGPUUsage.rst')
-rw-r--r--llvm/docs/AMDGPUUsage.rst18
1 files changed, 14 insertions, 4 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index e46437a..c3d4833 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -768,6 +768,9 @@ For example:
performant than code generated for XNACK replay
disabled.
+ cu-stores TODO On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used.
+ If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater.
+
=============== ============================ ==================================================
.. _amdgpu-target-id:
@@ -5107,7 +5110,9 @@ The fields used by CP for code objects before V3 also match those specified in
and must be 0,
>454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT
_SIZE
- 457:455 3 bits Reserved, must be 0.
+ 455 1 bit USES_CU_STORES GFX12.5: Whether the ``cu-stores`` target attribute is enabled.
+ If 0, then all stores are ``SCOPE_SE`` or higher.
+ 457:456 2 bits Reserved, must be 0.
458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-GFX9
Reserved, must be 0.
GFX10-GFX11
@@ -6358,10 +6363,13 @@ also have to wait on all global memory operations, which is unnecessary.
:doc:`Memory Model Relaxation Annotations <MemoryModelRelaxationAnnotations>` can
be used as an optimization hint for fences to solve this problem.
-The AMDGPU backend recognizes the following tags on fences:
+The AMDGPU backend recognizes the following tags on fences to control which address
+space a fence can synchronize:
+
+- ``amdgpu-synchronize-as:local`` - for the local address space
+- ``amdgpu-synchronize-as:global``- for the global address space
-- ``amdgpu-as:local`` - fence only the local address space
-- ``amdgpu-as:global``- fence only the global address space
+Multiple tags can be used at the same time to synchronize with more than one address space.
.. note::
@@ -18185,6 +18193,8 @@ terminated by an ``.end_amdhsa_kernel`` directive.
GFX942)
``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
+ ``.amdhsa_uses_cu_stores`` 0 GFX12.5 Controls USES_CU_STORES in
+ :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in
Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
Specific