aboutsummaryrefslogtreecommitdiff
path: root/clang
diff options
context:
space:
mode:
authorAlex Voicu <alexandru.voicu@amd.com>2025-07-23 13:55:46 +0100
committerGitHub <noreply@github.com>2025-07-23 13:55:46 +0100
commit01e23c3d626c30000820465f029793e44e2062e4 (patch)
tree860c70a6f6fa2078137b2bc042d78710641bdd33 /clang
parentefa25c4737440887772e6c6ed72029afa0bf05ca (diff)
downloadllvm-01e23c3d626c30000820465f029793e44e2062e4.zip
llvm-01e23c3d626c30000820465f029793e44e2062e4.tar.gz
llvm-01e23c3d626c30000820465f029793e44e2062e4.tar.bz2
[HIPSTDPAR] Add support for globals (#146813)
This (mostly) removes one of the largest remaining limitations of `hipstdpar` based algorithm acceleration, by adding support for global variable usage in offloaded algorithms. It is mean to compose with a run time component that will live in the support library, and fires iff a special variable is provided by the latter. In short, things work as follows: - We replace uses some global `G` with an indirect access via an implicitly created anonymous global `F`, which is of pointer type and is expected to hold the program-wide address of `G`; - We append 'F', alongside 'G''s name, to an table structure; - At run-time, the support library uses the table to look-up the program-wide address of a contained symbol based on its name, and then stores the address via the paired pointer. This doesn't handle internal linkage symbols (`static foo` or `namespace { foo }`) if they are not unique i.e. if there's a name clash that is solved by the linker, as the resolution would not be visible. Also, initially we will only support "true" globals in RDC mode. Things would be much simpler if we had direct access to the accelerator loader, but since the expectation is to compose at the HIP RT level we have to jump through additional hoops.
Diffstat (limited to 'clang')
-rw-r--r--clang/docs/HIPSupport.rst68
1 files changed, 19 insertions, 49 deletions
diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst
index 406e1c8..b4a671e 100644
--- a/clang/docs/HIPSupport.rst
+++ b/clang/docs/HIPSupport.rst
@@ -545,37 +545,22 @@ The following restrictions imposed on user code apply to both modes:
1. Pointers to function, and all associated features, such as e.g. dynamic
polymorphism, cannot be used (directly or transitively) by the user provided
callable passed to an algorithm invocation;
-2. Global / namespace scope / ``static`` / ``thread`` storage duration variables
- cannot be used (directly or transitively) in name by the user provided
- callable;
-
- - When executing in **HMM Mode** they can be used in address e.g.:
-
- .. code-block:: C++
-
- namespace { int foo = 42; }
-
- bool never(const std::vector<int>& v) {
- return std::any_of(std::execution::par_unseq, std::cbegin(v), std::cend(v), [](auto&& x) {
- return x == foo;
- });
- }
-
- bool only_in_hmm_mode(const std::vector<int>& v) {
- return std::any_of(std::execution::par_unseq, std::cbegin(v), std::cend(v),
- [p = &foo](auto&& x) { return x == *p; });
- }
-
-3. Only algorithms that are invoked with the ``parallel_unsequenced_policy`` are
+2. ``static`` (except for program-wide unique ones) / ``thread`` storage
+ duration variables cannot be used (directly or transitively) in name by the
+ user provided callable;
+3. User code must be compiled in ``-fgpu-rdc`` mode in order for global /
+ namespace scope variables / program-wide unique ``static`` storage duration
+ variables to be usable in name by the user provided callable;
+4. Only algorithms that are invoked with the ``parallel_unsequenced_policy`` are
candidates for offload;
-4. Only algorithms that are invoked with iterator arguments that model
+5. Only algorithms that are invoked with iterator arguments that model
`random_access_iterator <https://en.cppreference.com/w/cpp/iterator/random_access_iterator>`_
are candidates for offload;
-5. `Exceptions <https://en.cppreference.com/w/cpp/language/exceptions>`_ cannot
+6. `Exceptions <https://en.cppreference.com/w/cpp/language/exceptions>`_ cannot
be used by the user provided callable;
-6. Dynamic memory allocation (e.g. ``operator new``) cannot be used by the user
+7. Dynamic memory allocation (e.g. ``operator new``) cannot be used by the user
provided callable;
-7. Selective offload is not possible i.e. it is not possible to indicate that
+8. Selective offload is not possible i.e. it is not possible to indicate that
only some algorithms invoked with the ``parallel_unsequenced_policy`` are to
be executed on the accelerator.
@@ -585,15 +570,6 @@ additional restrictions:
1. All code that is expected to interoperate has to be recompiled with the
``--hipstdpar-interpose-alloc`` flag i.e. it is not safe to compose libraries
that have been independently compiled;
-2. automatic storage duration (i.e. stack allocated) variables cannot be used
- (directly or transitively) by the user provided callable e.g.
-
- .. code-block:: c++
-
- bool never(const std::vector<int>& v, int n) {
- return std::any_of(std::execution::par_unseq, std::cbegin(v), std::cend(v),
- [p = &n](auto&& x) { return x == *p; });
- }
Current Support
===============
@@ -626,17 +602,12 @@ Linux operating system. Support is synthesised in the following table:
The minimum Linux kernel version for running in HMM mode is 6.4.
-The forwarding header can be obtained from
-`its GitHub repository <https://github.com/ROCm/roc-stdpar>`_.
-It will be packaged with a future `ROCm <https://rocm.docs.amd.com/en/latest/>`_
-release. Because accelerated algorithms are provided via
-`rocThrust <https://rocm.docs.amd.com/projects/rocThrust/en/latest/>`_, a
-transitive dependency on
-`rocPrim <https://rocm.docs.amd.com/projects/rocPRIM/en/latest/>`_ exists. Both
-can be obtained either by installing their associated components of the
-`ROCm <https://rocm.docs.amd.com/en/latest/>`_ stack, or from their respective
-repositories. The list algorithms that can be offloaded is available
-`here <https://github.com/ROCm/roc-stdpar#algorithm-support-status>`_.
+The forwarding header is packaged by
+`ROCm <https://rocm.docs.amd.com/en/latest/>`_, and is obtainable by installing
+the `hipstdpar` packege. The list algorithms that can be offloaded is available
+`here <https://github.com/ROCm/roc-stdpar#algorithm-support-status>`_. More
+details are available via the dedicated blog
+`<https://rocm.blogs.amd.com/software-tools-optimization/hipstdpar/README.html>`_.
HIP Specific Elements
---------------------
@@ -690,9 +661,8 @@ HIP Specific Elements
Open Questions / Future Developments
====================================
-1. The restriction on the use of global / namespace scope / ``static`` /
- ``thread`` storage duration variables in offloaded algorithms will be lifted
- in the future, when running in **HMM Mode**;
+1. The restriction on the use of ``static`` / ``thread`` storage duration
+ variables in offloaded algorithms might be lifted;
2. The restriction on the use of dynamic memory allocation in offloaded
algorithms will be lifted in the future.
3. The restriction on the use of pointers to function, and associated features