riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Joseph Huber <huberjn@outlook.com>	2024-01-02 16:53:53 -0600
committer	GitHub <noreply@github.com>	2024-01-02 16:53:53 -0600
commit	fb32977ac768f27890af28308a6968c30af2aa3e (patch)
tree	26bb91d96402f02fa8648954819e4ba98f2924e8 /llvm/lib/Object/WasmObjectFile.cpp
parent	41a07e668c29e219ed2f26d61da8b6b3295ff967 (diff)
download	llvm-fb32977ac768f27890af28308a6968c30af2aa3e.zip llvm-fb32977ac768f27890af28308a6968c30af2aa3e.tar.gz llvm-fb32977ac768f27890af28308a6968c30af2aa3e.tar.bz2

[Libomptarget] Fix RPC-based malloc on NVPTX (#72440)

Summary: The device allocator on NVPTX architectures is enqueued to a stream that the kernel is potentially executing on. This can lead to deadlocks as the kernel will not proceed until the allocation is complete and the allocation will not proceed until the kernel is complete. CUDA 11.2 introduced async allocations that we can manually place on separate streams to combat this. This patch makes a new allocation type that's guaranteed to be non-blocking so it will actually make progress, only Nvidia needs to care about this as the others are not blocking in this way by default. I had originally tried to make the `alloc` and `free` methods take a `__tgt_async_info`. However, I observed that with the large volume of streams being created by a parallel test it quickly locked up the system as presumably too many streams were being created. This implementation not just creates a new stream and immediately destroys it. This obviously isn't very fast, but it at least gets the cases to stop deadlocking for now.

Diffstat (limited to 'llvm/lib/Object/WasmObjectFile.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: