riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Joseph Huber <35342157+jhuber6@users.noreply.github.com>	2023-10-27 14:55:37 -0500
committer	GitHub <noreply@github.com>	2023-10-27 14:55:37 -0500
commit	8e447a123b0006b4c14fcb61049559039b3da32f (patch)
tree	d1dded85da1afa2ff47dbcaa8e619478c39497ac /llvm/lib/CodeGen/MachineBasicBlock.cpp
parent	741579930e9b49cee9d859a7468f5e9d0c6f2006 (diff)
download	llvm-8e447a123b0006b4c14fcb61049559039b3da32f.zip llvm-8e447a123b0006b4c14fcb61049559039b3da32f.tar.gz llvm-8e447a123b0006b4c14fcb61049559039b3da32f.tar.bz2

[libc] Optimize the RPC memory copy for the AMDGPU target (#70467)

Summary: We previously made the change to make the GPU target use builtin implementations of memory copy functions. However, this had the negative effect of massively increasing register usages when using the printing interface. For example, a `printf` call went from using 25 VGPRs to 54 simply because of using the builtin. However, we probably want to still export the builitin, but for the RPC interface we heavily prefer small resource usage over the performance gains of fully unrolling this loop. For NVPTX however, the builtin implementation causes the resource usage to go down (36 registers total for a regular `fputs` call) so we will maintain that implementation. I think specializing this is the right call as we will always prefer the implementation with the smallest resource footprint for this interface, as performance is already going to be heavily bottlenecked by the use of fine-grained memory.

Diffstat (limited to 'llvm/lib/CodeGen/MachineBasicBlock.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: