diff options
author | Joseph Huber <huberjn@outlook.com> | 2025-07-26 22:31:50 -0500 |
---|---|---|
committer | Joseph Huber <huberjn@outlook.com> | 2025-07-28 09:23:29 -0500 |
commit | 9975dfdf800d9881b704a988bc004ec81639fe67 (patch) | |
tree | e2a2a661a6695e7b886d60f3f7899bbc5c8a764a /clang/lib/Frontend/CompilerInvocation.cpp | |
parent | 166493d6927026c4933be82de81adabc9751c0e3 (diff) | |
download | llvm-9975dfdf800d9881b704a988bc004ec81639fe67.zip llvm-9975dfdf800d9881b704a988bc004ec81639fe67.tar.gz llvm-9975dfdf800d9881b704a988bc004ec81639fe67.tar.bz2 |
[libc] Small performance improvements to GPU allocator
Summary:
This slightly increases performance in a few places. First, we
optimistically assume the cached slab has ample space which lets us
avoid the atomic load on the highly contended counter in the case that
it is likely to succeed. Second, we no longer call `match_any` twice as
we can calculate the uniform slabs at the moment we return them.
Thirdly, we always choose a random index on a 32-bit boundary. This
means that in the fast case we fulfil the allocation with a single
`fetch_or`, and in the other case we quickly move to the free bit.
This nets around a 7.75% improvement for the fast path case.
Diffstat (limited to 'clang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions