aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Object/WasmObjectFile.cpp
diff options
context:
space:
mode:
authorJoseph Huber <huberjn@outlook.com>2024-01-02 16:53:53 -0600
committerGitHub <noreply@github.com>2024-01-02 16:53:53 -0600
commitfb32977ac768f27890af28308a6968c30af2aa3e (patch)
tree26bb91d96402f02fa8648954819e4ba98f2924e8 /llvm/lib/Object/WasmObjectFile.cpp
parent41a07e668c29e219ed2f26d61da8b6b3295ff967 (diff)
downloadllvm-fb32977ac768f27890af28308a6968c30af2aa3e.zip
llvm-fb32977ac768f27890af28308a6968c30af2aa3e.tar.gz
llvm-fb32977ac768f27890af28308a6968c30af2aa3e.tar.bz2
[Libomptarget] Fix RPC-based malloc on NVPTX (#72440)
Summary: The device allocator on NVPTX architectures is enqueued to a stream that the kernel is potentially executing on. This can lead to deadlocks as the kernel will not proceed until the allocation is complete and the allocation will not proceed until the kernel is complete. CUDA 11.2 introduced async allocations that we can manually place on separate streams to combat this. This patch makes a new allocation type that's guaranteed to be non-blocking so it will actually make progress, only Nvidia needs to care about this as the others are not blocking in this way by default. I had originally tried to make the `alloc` and `free` methods take a `__tgt_async_info`. However, I observed that with the large volume of streams being created by a parallel test it quickly locked up the system as presumably too many streams were being created. This implementation not just creates a new stream and immediately destroys it. This obviously isn't very fast, but it at least gets the cases to stop deadlocking for now.
Diffstat (limited to 'llvm/lib/Object/WasmObjectFile.cpp')
0 files changed, 0 insertions, 0 deletions