[RFC][AMDGPU] Add AMDGPU_SUMMARY bitcode block for ThinLTO - rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Shilei Tian <i@tianshilei.me>	2026-04-28 00:33:42 -0400
committer	Shilei Tian <i@tianshilei.me>	2026-04-28 21:15:00 -0400
commit	630bff8a2248da1873f27060d17301b5a5606ebb (patch)
tree	b52eb68f1b6aefc18a1fce7d7d63d12aa62af817 /offload
parent	383733ea8d15524517b0f1f15c8380c24f17407d (diff)
download	llvm-users/shiltian/amdgpu-thinlto-summary-block.tar.gz llvm-users/shiltian/amdgpu-thinlto-summary-block.tar.bz2 llvm-users/shiltian/amdgpu-thinlto-summary-block.zip

[RFC][AMDGPU] Add AMDGPU_SUMMARY bitcode block for ThinLTOusers/shiltian/amdgpu-thinlto-summary-block

With AMDGPU object linking, device functions are compiled separately from the kernels that call them. Without whole-program visibility, the compiler must be conservative about occupancy for every device function, leading to suboptimal resource usage. However, GPU kernels typically carry explicit occupancy control attributes that constrain the launch environment. ThinLTO is the natural place to propagate these kernel attributes to callees: the combined module summary index contains a cross-TU call graph, allowing occupancy information to be propagated top-down from kernels to all reachable device functions. The backend can then generate better code with the propagated constraints, achieving whole-program awareness without the compile-time overhead of full LTO. This patch introduces a dedicated AMDGPU_SUMMARY bitcode block that serializes per-function summary data alongside the standard module summary. The block is scoped to AMDGPU so that non-AMDGPU targets are completely unaffected. A follow-up patch will add the ThinLTO propagation logic that reads these summaries and applies conservative attribute bounds to device functions reachable from multiple kernels.

Diffstat (limited to 'offload')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: