aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Analysis/LoopAccessAnalysis.cpp
diff options
context:
space:
mode:
authorGuray Ozen <guray.ozen@gmail.com>2023-11-27 11:05:07 +0100
committerGitHub <noreply@github.com>2023-11-27 11:05:07 +0100
commitedf5cae7391cdb097a090ea142dfa7ac6ac03555 (patch)
tree423383047badea2aa92ebc6e60cd0ced1cea9c85 /llvm/lib/Analysis/LoopAccessAnalysis.cpp
parentd1652ff0803ac9f2f3ea99336f71edacdf95a721 (diff)
downloadllvm-edf5cae7391cdb097a090ea142dfa7ac6ac03555.zip
llvm-edf5cae7391cdb097a090ea142dfa7ac6ac03555.tar.gz
llvm-edf5cae7391cdb097a090ea142dfa7ac6ac03555.tar.bz2
[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.
Diffstat (limited to 'llvm/lib/Analysis/LoopAccessAnalysis.cpp')
0 files changed, 0 insertions, 0 deletions