diff options
author | Andrea Faulds <andrea.faulds@amd.com> | 2024-08-20 19:37:03 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-08-20 13:37:03 -0400 |
commit | 7aa22f013e24d20291aad745368ff907baa9dfa4 (patch) | |
tree | f423f31202b37392ca308ce02f45016acb8d9cb0 /llvm/lib/Object/MachOObjectFile.cpp | |
parent | 93eda08babe95188ee41400035abaade79cda7d1 (diff) | |
download | llvm-7aa22f013e24d20291aad745368ff907baa9dfa4.zip llvm-7aa22f013e24d20291aad745368ff907baa9dfa4.tar.gz llvm-7aa22f013e24d20291aad745368ff907baa9dfa4.tar.bz2 |
[mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851)
This enables performing several reductions in parallel, each smaller
than the size of the subgroup.
One potential application is flash attention with subgroup-wide matrix
multiplication and reduction combined in one kernel. The multiplication
operation requires a 2D matrix to be distributed over the lanes of the
subgroup, which then constrains the shape the following reduction can
have if we want to keep data in registers.
Diffstat (limited to 'llvm/lib/Object/MachOObjectFile.cpp')
0 files changed, 0 insertions, 0 deletions