diff options
author | Joseph Huber <huberjn@outlook.com> | 2025-07-27 20:45:47 -0500 |
---|---|---|
committer | Joseph Huber <huberjn@outlook.com> | 2025-07-28 09:23:29 -0500 |
commit | a7649007ef269c397b5d474d1b5f4432da96d1de (patch) | |
tree | 5365b9a013292701936b2a3ea7832f79b31178e6 /clang/lib/Frontend/CompilerInvocation.cpp | |
parent | 9975dfdf800d9881b704a988bc004ec81639fe67 (diff) | |
download | llvm-a7649007ef269c397b5d474d1b5f4432da96d1de.zip llvm-a7649007ef269c397b5d474d1b5f4432da96d1de.tar.gz llvm-a7649007ef269c397b5d474d1b5f4432da96d1de.tar.bz2 |
[libc] Rework match any use in hot allocate bitfield loop
Summary:
We previously used `match_all` as the shortcut to figure out which
threads were destined for which slots. This lowers to a for-loop, which
even if it often only executes once still causes some slowdown
especially when divergent. Instead we use a single ballot call and then
calculate it.
Here the ballot tells us which lanes are the first in a block, either
the starting index or the barrier for a new 32-bit int. We then use some
bit magic to figure out for each lane ID its closest leader. For the
length we simply use the length calculated by the leader of the
remaining bits to be written. This removes the match any and the
shuffle, which improves the minimum number of cycles this takes by about
5%.
Diffstat (limited to 'clang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions