aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Support/CommandLine.cpp
diff options
context:
space:
mode:
authorGuray Ozen <guray.ozen@gmail.com>2023-11-10 16:52:00 +0100
committerGitHub <noreply@github.com>2023-11-10 16:52:00 +0100
commita00caad6bf318a7497d477b434464ca75ecb41fc (patch)
tree3168423fcb6e22f7d92ac5bcf3ccf3fb484a66ef /llvm/lib/Support/CommandLine.cpp
parent4ba50a783b4d6cbfc93d989e4d2bdfe4e4726e4a (diff)
downloadllvm-a00caad6bf318a7497d477b434464ca75ecb41fc.zip
llvm-a00caad6bf318a7497d477b434464ca75ecb41fc.tar.gz
llvm-a00caad6bf318a7497d477b434464ca75ecb41fc.tar.bz2
[mlir] Add sm_90a GEMM test 128x128x128 (F32 =F16*F16) with predicate (#70028)
PR #69913 added a GEMM test (128x128x128 F32 += F16 * F16) with if-statement. This PR adds the same test using predicates in PTX. Predicate support is enabled using _BasicPtxBuilderInterface_ `(nvgpu.opcode ..., predicate = %pred)`. The predicate condition is computed in `Step 2. [GPU] Elect fastest thread in CTA` inspired by cutlass. It is as follows: ``` lane_predicate = nvvm.elect.sync warp_idx = __shfl_sync(0xffffffff, threadIdx.x / 32, 0) warp_idx_in_warp_group = warp_idx % 4 predicate = (lane_predicate & warp_idx_in_warp_group) ``` Depends on #70027 #69934 #69935 #69584
Diffstat (limited to 'llvm/lib/Support/CommandLine.cpp')
0 files changed, 0 insertions, 0 deletions