diff options
| author | Kirill Vedernikov <kvedernikov@nvidia.com> | 2026-02-25 14:25:05 +0100 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2026-02-25 14:25:05 +0100 |
| commit | ee34eb6edccdebc2a752ffecdde5faae6b0d5593 (patch) | |
| tree | f0a33e3aa63b6694bd461df131f465173295fec5 /offload/test/offloading/fortran/target-parallel-do-collapse.f90 | |
| parent | 717a9ab442a44a595d7bb422e78721e8f65ae8a9 (diff) | |
| download | llvm-main.zip llvm-main.tar.gz llvm-main.tar.bz2 | |
[MLIR][NVVM] Fix kFactor for fp8/fp6/fp4 types in MmaSpOp verifier. Improve mma tests. (#183133)HEADmain
Fix an incorrect kFactor value for e4m3/e5m2, e3m2/e2m3, e2m1 types in
MmaSpOp::verify(). The kFactor for these types was set to 32 but should
be 16.
kFactor is used to compute the expected number of operand A/B register
fragments. With kFactor=32 (wrong) and the only allowed shape m16n8k64,
the fragment count was incorrect. With kFactor=16 (correct), it matches
the PTX ISA definition for mma.sp with fp8/fp6/fp4 A/B operands.
PTX ISA reference:
[https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-sparse-mma](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-sparse-mma)
Also improve existing MLIR dialect tests for nvvm.mma.sp.sync and add
new mlir-translate tests covering mma, mma.sp, and blockscale variants.
Diffstat (limited to 'offload/test/offloading/fortran/target-parallel-do-collapse.f90')
0 files changed, 0 insertions, 0 deletions
