diff options
author | Guray Ozen <guray.ozen@gmail.com> | 2023-11-27 11:05:07 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-11-27 11:05:07 +0100 |
commit | edf5cae7391cdb097a090ea142dfa7ac6ac03555 (patch) | |
tree | 423383047badea2aa92ebc6e60cd0ced1cea9c85 /llvm/lib/Analysis/LoopAccessAnalysis.cpp | |
parent | d1652ff0803ac9f2f3ea99336f71edacdf95a721 (diff) | |
download | llvm-edf5cae7391cdb097a090ea142dfa7ac6ac03555.zip llvm-edf5cae7391cdb097a090ea142dfa7ac6ac03555.tar.gz llvm-edf5cae7391cdb097a090ea142dfa7ac6ac03555.tar.bz2 |
[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.
This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.
The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.
An example of this implementation can be seen below:
```
gpu.launch_func @kernel_module::@kernel
clusters in (%1, %0, %0) // <-- Optional
blocks in (%0, %0, %0)
threads in (%0, %0, %0)
```
The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:
```
%cidX = gpu.cluster_id x
%cidY = gpu.cluster_id y
%cidZ = gpu.cluster_id z
%cdimX = gpu.cluster_dim x
%cdimY = gpu.cluster_dim y
%cdimZ = gpu.cluster_dim z
```
We will introduce cluster support in `gpu.launch` Op in an upcoming PR.
See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.
Diffstat (limited to 'llvm/lib/Analysis/LoopAccessAnalysis.cpp')
0 files changed, 0 insertions, 0 deletions