riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Andrea Faulds <andrea.faulds@amd.com>	2024-08-20 19:37:03 +0200
committer	GitHub <noreply@github.com>	2024-08-20 13:37:03 -0400
commit	7aa22f013e24d20291aad745368ff907baa9dfa4 (patch)
tree	f423f31202b37392ca308ce02f45016acb8d9cb0 /llvm/lib/Object/MachOObjectFile.cpp
parent	93eda08babe95188ee41400035abaade79cda7d1 (diff)
download	llvm-7aa22f013e24d20291aad745368ff907baa9dfa4.zip llvm-7aa22f013e24d20291aad745368ff907baa9dfa4.tar.gz llvm-7aa22f013e24d20291aad745368ff907baa9dfa4.tar.bz2

[mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851)

This enables performing several reductions in parallel, each smaller than the size of the subgroup. One potential application is flash attention with subgroup-wide matrix multiplication and reduction combined in one kernel. The multiplication operation requires a 2D matrix to be distributed over the lanes of the subgroup, which then constrains the shape the following reduction can have if we want to keep data in registers.

Diffstat (limited to 'llvm/lib/Object/MachOObjectFile.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: