rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kirill Vedernikov <kvedernikov@nvidia.com>	2026-02-25 14:25:05 +0100
committer	GitHub <noreply@github.com>	2026-02-25 14:25:05 +0100
commit	ee34eb6edccdebc2a752ffecdde5faae6b0d5593 (patch)
tree	f0a33e3aa63b6694bd461df131f465173295fec5 /offload/test/offloading/fortran/target-parallel-do-collapse.f90
parent	717a9ab442a44a595d7bb422e78721e8f65ae8a9 (diff)
download	llvm-main.zip llvm-main.tar.gz llvm-main.tar.bz2

[MLIR][NVVM] Fix kFactor for fp8/fp6/fp4 types in MmaSpOp verifier. Improve mma tests. (#183133)HEAD main

Fix an incorrect kFactor value for e4m3/e5m2, e3m2/e2m3, e2m1 types in MmaSpOp::verify(). The kFactor for these types was set to 32 but should be 16. kFactor is used to compute the expected number of operand A/B register fragments. With kFactor=32 (wrong) and the only allowed shape m16n8k64, the fragment count was incorrect. With kFactor=16 (correct), it matches the PTX ISA definition for mma.sp with fp8/fp6/fp4 A/B operands. PTX ISA reference: [https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-sparse-mma](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-sparse-mma) Also improve existing MLIR dialect tests for nvvm.mma.sp.sync and add new mlir-translate tests covering mma, mma.sp, and blockscale variants.

Diffstat (limited to 'offload/test/offloading/fortran/target-parallel-do-collapse.f90')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: