diff options
author | Guray Ozen <guray.ozen@gmail.com> | 2023-11-10 16:52:00 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-11-10 16:52:00 +0100 |
commit | a00caad6bf318a7497d477b434464ca75ecb41fc (patch) | |
tree | 3168423fcb6e22f7d92ac5bcf3ccf3fb484a66ef /llvm/lib/Support/CommandLine.cpp | |
parent | 4ba50a783b4d6cbfc93d989e4d2bdfe4e4726e4a (diff) | |
download | llvm-a00caad6bf318a7497d477b434464ca75ecb41fc.zip llvm-a00caad6bf318a7497d477b434464ca75ecb41fc.tar.gz llvm-a00caad6bf318a7497d477b434464ca75ecb41fc.tar.bz2 |
[mlir] Add sm_90a GEMM test 128x128x128 (F32 =F16*F16) with predicate (#70028)
PR #69913 added a GEMM test (128x128x128 F32 += F16 * F16) with
if-statement. This PR adds the same test using predicates in PTX.
Predicate support is enabled using _BasicPtxBuilderInterface_
`(nvgpu.opcode ..., predicate = %pred)`.
The predicate condition is computed in `Step 2. [GPU] Elect fastest
thread in CTA` inspired by cutlass. It is as follows:
```
lane_predicate = nvvm.elect.sync
warp_idx = __shfl_sync(0xffffffff, threadIdx.x / 32, 0)
warp_idx_in_warp_group = warp_idx % 4
predicate = (lane_predicate & warp_idx_in_warp_group)
```
Depends on #70027 #69934 #69935 #69584
Diffstat (limited to 'llvm/lib/Support/CommandLine.cpp')
0 files changed, 0 insertions, 0 deletions