aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/Frontend/CompilerInvocation.cpp
diff options
context:
space:
mode:
authorJoseph Huber <huberjn@outlook.com>2025-07-26 22:31:50 -0500
committerJoseph Huber <huberjn@outlook.com>2025-07-28 09:23:29 -0500
commit9975dfdf800d9881b704a988bc004ec81639fe67 (patch)
treee2a2a661a6695e7b886d60f3f7899bbc5c8a764a /clang/lib/Frontend/CompilerInvocation.cpp
parent166493d6927026c4933be82de81adabc9751c0e3 (diff)
downloadllvm-9975dfdf800d9881b704a988bc004ec81639fe67.zip
llvm-9975dfdf800d9881b704a988bc004ec81639fe67.tar.gz
llvm-9975dfdf800d9881b704a988bc004ec81639fe67.tar.bz2
[libc] Small performance improvements to GPU allocator
Summary: This slightly increases performance in a few places. First, we optimistically assume the cached slab has ample space which lets us avoid the atomic load on the highly contended counter in the case that it is likely to succeed. Second, we no longer call `match_any` twice as we can calculate the uniform slabs at the moment we return them. Thirdly, we always choose a random index on a 32-bit boundary. This means that in the fast case we fulfil the allocation with a single `fetch_or`, and in the other case we quickly move to the free bit. This nets around a 7.75% improvement for the fast path case.
Diffstat (limited to 'clang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions