riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Guray Ozen <guray.ozen@gmail.com>	2023-11-10 16:52:00 +0100
committer	GitHub <noreply@github.com>	2023-11-10 16:52:00 +0100
commit	a00caad6bf318a7497d477b434464ca75ecb41fc (patch)
tree	3168423fcb6e22f7d92ac5bcf3ccf3fb484a66ef /llvm/lib/Support/CommandLine.cpp
parent	4ba50a783b4d6cbfc93d989e4d2bdfe4e4726e4a (diff)
download	llvm-a00caad6bf318a7497d477b434464ca75ecb41fc.zip llvm-a00caad6bf318a7497d477b434464ca75ecb41fc.tar.gz llvm-a00caad6bf318a7497d477b434464ca75ecb41fc.tar.bz2

[mlir] Add sm_90a GEMM test 128x128x128 (F32 =F16*F16) with predicate (#70028)

PR #69913 added a GEMM test (128x128x128 F32 += F16 * F16) with if-statement. This PR adds the same test using predicates in PTX. Predicate support is enabled using _BasicPtxBuilderInterface_ `(nvgpu.opcode ..., predicate = %pred)`. The predicate condition is computed in `Step 2. [GPU] Elect fastest thread in CTA` inspired by cutlass. It is as follows: ``` lane_predicate = nvvm.elect.sync warp_idx = __shfl_sync(0xffffffff, threadIdx.x / 32, 0) warp_idx_in_warp_group = warp_idx % 4 predicate = (lane_predicate & warp_idx_in_warp_group) ``` Depends on #70027 #69934 #69935 #69584

Diffstat (limited to 'llvm/lib/Support/CommandLine.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: