aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/MachineBasicBlock.cpp
diff options
context:
space:
mode:
authorJoseph Huber <35342157+jhuber6@users.noreply.github.com>2023-10-27 14:55:37 -0500
committerGitHub <noreply@github.com>2023-10-27 14:55:37 -0500
commit8e447a123b0006b4c14fcb61049559039b3da32f (patch)
treed1dded85da1afa2ff47dbcaa8e619478c39497ac /llvm/lib/CodeGen/MachineBasicBlock.cpp
parent741579930e9b49cee9d859a7468f5e9d0c6f2006 (diff)
downloadllvm-8e447a123b0006b4c14fcb61049559039b3da32f.zip
llvm-8e447a123b0006b4c14fcb61049559039b3da32f.tar.gz
llvm-8e447a123b0006b4c14fcb61049559039b3da32f.tar.bz2
[libc] Optimize the RPC memory copy for the AMDGPU target (#70467)
Summary: We previously made the change to make the GPU target use builtin implementations of memory copy functions. However, this had the negative effect of massively increasing register usages when using the printing interface. For example, a `printf` call went from using 25 VGPRs to 54 simply because of using the builtin. However, we probably want to still export the builitin, but for the RPC interface we heavily prefer small resource usage over the performance gains of fully unrolling this loop. For NVPTX however, the builtin implementation causes the resource usage to go down (36 registers total for a regular `fputs` call) so we will maintain that implementation. I think specializing this is the right call as we will always prefer the implementation with the smallest resource footprint for this interface, as performance is already going to be heavily bottlenecked by the use of fine-grained memory.
Diffstat (limited to 'llvm/lib/CodeGen/MachineBasicBlock.cpp')
0 files changed, 0 insertions, 0 deletions