rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Guray Ozen <guray.ozen@gmail.com>	2023-10-05 10:54:13 +0200
committer	GitHub <noreply@github.com>	2023-10-05 10:54:13 +0200
commit	d20fbc900783db3a87c8da622ede280d93f890bb (patch)
tree	c7544f804fe188debb6be12650d902f6ec4388e0 /clang/unittests/Lex/ModuleDeclStateTest.cpp
parent	c7d6d62252e356be096462fc8b416bd6f725afbd (diff)
download	llvm-d20fbc900783db3a87c8da622ede280d93f890bb.zip llvm-d20fbc900783db3a87c8da622ede280d93f890bb.tar.gz llvm-d20fbc900783db3a87c8da622ede280d93f890bb.tar.bz2

[MLIR][NVGPU] Introduce `nvgpu.wargroup.mma.store` Op for Hopper GPUs (#65441)

This PR introduces a new Op called `warpgroup.mma.store` to the NVGPU dialect of MLIR. The purpose of this operation is to facilitate storing fragmanted result(s) `nvgpu.warpgroup.accumulator` produced by `warpgroup.mma` to the given memref. An example of fragmentated matrix is given here : https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#wgmma-64n16-d The `warpgroup.mma.store` does followings: 1) Takes one or more `nvgpu.warpgroup.accumulator` type (fragmented results matrix) 2) Calculates indexes per thread in warp-group and stores the data into give memref. Here's an example usage: ``` // A warpgroup performs GEMM, results in fragmented matrix %result1, %result2 = nvgpu.warpgroup.mma ... // Stores the fragmented result to memref nvgpu.warpgroup.mma.store [%result1, %result2], %matrixD : !nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>, !nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>> to memref<128x128xf32,3> ```

Diffstat (limited to 'clang/unittests/Lex/ModuleDeclStateTest.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: