aboutsummaryrefslogtreecommitdiff
path: root/offload/test/offloading/fortran/target-parallel-do-collapse.f90
diff options
context:
space:
mode:
authorKirill Vedernikov <kvedernikov@nvidia.com>2026-02-25 14:25:05 +0100
committerGitHub <noreply@github.com>2026-02-25 14:25:05 +0100
commitee34eb6edccdebc2a752ffecdde5faae6b0d5593 (patch)
treef0a33e3aa63b6694bd461df131f465173295fec5 /offload/test/offloading/fortran/target-parallel-do-collapse.f90
parent717a9ab442a44a595d7bb422e78721e8f65ae8a9 (diff)
downloadllvm-main.zip
llvm-main.tar.gz
llvm-main.tar.bz2
[MLIR][NVVM] Fix kFactor for fp8/fp6/fp4 types in MmaSpOp verifier. Improve mma tests. (#183133)HEADmain
Fix an incorrect kFactor value for e4m3/e5m2, e3m2/e2m3, e2m1 types in MmaSpOp::verify(). The kFactor for these types was set to 32 but should be 16. kFactor is used to compute the expected number of operand A/B register fragments. With kFactor=32 (wrong) and the only allowed shape m16n8k64, the fragment count was incorrect. With kFactor=16 (correct), it matches the PTX ISA definition for mma.sp with fp8/fp6/fp4 A/B operands. PTX ISA reference: [https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-sparse-mma](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-sparse-mma) Also improve existing MLIR dialect tests for nvvm.mma.sp.sync and add new mlir-translate tests covering mma, mma.sp, and blockscale variants.
Diffstat (limited to 'offload/test/offloading/fortran/target-parallel-do-collapse.f90')
0 files changed, 0 insertions, 0 deletions