diff options
author | Guray Ozen <guray.ozen@gmail.com> | 2023-08-09 17:56:59 +0200 |
---|---|---|
committer | Guray Ozen <guray.ozen@gmail.com> | 2023-08-09 23:08:00 +0200 |
commit | 18e161f9e15b036faf48bfd8813d9330e06e2ee3 (patch) | |
tree | 32a5c253dd30b3766729e033a2118b42c2e0f78a /llvm/lib/Transforms/Utils/InlineFunction.cpp | |
parent | 90dbd5860bf650c769e84172e0835901396d81a2 (diff) | |
download | llvm-18e161f9e15b036faf48bfd8813d9330e06e2ee3.zip llvm-18e161f9e15b036faf48bfd8813d9330e06e2ee3.tar.gz llvm-18e161f9e15b036faf48bfd8813d9330e06e2ee3.tar.bz2 |
[MLIR][NVVM] Introduction of the `wgmma.mma_async` Op
This work introduces the `wgmma.mma_async` Op along PTX generation using `BasicPtxBuilderOpInterface`. The Op is designed to execute the matrix multiply-and-accumulate operation across a warpgroup (128 threads). It's important to note that this operation works for devices with the sm_90a capability.
The matrix multiply-and-accumulate operation can take one of the following forms. In both cases, matrix D is referred to as the accumulator:
D = A * B + D : Result is added to the accumulator matrix D.
D = A * B : The input from the accumulator matrix D is not utilized.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D157370
Diffstat (limited to 'llvm/lib/Transforms/Utils/InlineFunction.cpp')
0 files changed, 0 insertions, 0 deletions