aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Object/MachOObjectFile.cpp
diff options
context:
space:
mode:
authorAndrea Faulds <andrea.faulds@amd.com>2024-08-20 19:37:03 +0200
committerGitHub <noreply@github.com>2024-08-20 13:37:03 -0400
commit7aa22f013e24d20291aad745368ff907baa9dfa4 (patch)
treef423f31202b37392ca308ce02f45016acb8d9cb0 /llvm/lib/Object/MachOObjectFile.cpp
parent93eda08babe95188ee41400035abaade79cda7d1 (diff)
downloadllvm-7aa22f013e24d20291aad745368ff907baa9dfa4.zip
llvm-7aa22f013e24d20291aad745368ff907baa9dfa4.tar.gz
llvm-7aa22f013e24d20291aad745368ff907baa9dfa4.tar.bz2
[mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851)
This enables performing several reductions in parallel, each smaller than the size of the subgroup. One potential application is flash attention with subgroup-wide matrix multiplication and reduction combined in one kernel. The multiplication operation requires a 2D matrix to be distributed over the lanes of the subgroup, which then constrains the shape the following reduction can have if we want to keep data in registers.
Diffstat (limited to 'llvm/lib/Object/MachOObjectFile.cpp')
0 files changed, 0 insertions, 0 deletions