aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/Frontend/CompilerInvocation.cpp
diff options
context:
space:
mode:
authorJoseph Huber <huberjn@outlook.com>2025-07-27 20:45:47 -0500
committerJoseph Huber <huberjn@outlook.com>2025-07-28 09:23:29 -0500
commita7649007ef269c397b5d474d1b5f4432da96d1de (patch)
tree5365b9a013292701936b2a3ea7832f79b31178e6 /clang/lib/Frontend/CompilerInvocation.cpp
parent9975dfdf800d9881b704a988bc004ec81639fe67 (diff)
downloadllvm-a7649007ef269c397b5d474d1b5f4432da96d1de.zip
llvm-a7649007ef269c397b5d474d1b5f4432da96d1de.tar.gz
llvm-a7649007ef269c397b5d474d1b5f4432da96d1de.tar.bz2
[libc] Rework match any use in hot allocate bitfield loop
Summary: We previously used `match_all` as the shortcut to figure out which threads were destined for which slots. This lowers to a for-loop, which even if it often only executes once still causes some slowdown especially when divergent. Instead we use a single ballot call and then calculate it. Here the ballot tells us which lanes are the first in a block, either the starting index or the barrier for a new 32-bit int. We then use some bit magic to figure out for each lane ID its closest leader. For the length we simply use the length calculated by the leader of the remaining bits to be written. This removes the match any and the shuffle, which improves the minimum number of cycles this takes by about 5%.
Diffstat (limited to 'clang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions