riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Guray Ozen <guray.ozen@gmail.com>	2023-08-22 12:33:36 +0200
committer	Guray Ozen <guray.ozen@gmail.com>	2023-08-22 16:12:25 +0200
commit	cce3e8ed895b2d4c1396929c363c071e15fdbf8b (patch)
tree	84cea5a4ac49e0519e2527678ecbb5f621ec56e7 /clang/lib/ExtractAPI/Serialization/SymbolGraphSerializer.cpp
parent	f740bcb3707a17ed4ccd52157089011a586cc2a6 (diff)
download	llvm-cce3e8ed895b2d4c1396929c363c071e15fdbf8b.zip llvm-cce3e8ed895b2d4c1396929c363c071e15fdbf8b.tar.gz llvm-cce3e8ed895b2d4c1396929c363c071e15fdbf8b.tar.bz2

[MLIR][NVGPU] Introduction of wgmma.generate.descriptor Op

This work introduces a new Op, `wgmma.generate.descriptor`, designed to create a wgmma descriptor for inputs of matrix multiply and accumulate operations using `wgmma.mma_async` PTX instruction. The descriptor format specifications can be found in the following link: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#asynchronous-warpgroup-level-matrix-shared-memory-layout-matrix-descriptor It's important to note that this op is in its initial phase, and it does come with certain limitations. It only supports 128b swizzling and does not incorporate interleaving. In the future, different calculations will be addressed in separate works, expanding the capabilities of the op. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D157382

Diffstat (limited to 'clang/lib/ExtractAPI/Serialization/SymbolGraphSerializer.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: