aboutsummaryrefslogtreecommitdiff
path: root/offload
diff options
context:
space:
mode:
authorShilei Tian <i@tianshilei.me>2026-04-28 00:33:42 -0400
committerShilei Tian <i@tianshilei.me>2026-04-28 21:15:00 -0400
commit630bff8a2248da1873f27060d17301b5a5606ebb (patch)
treeb52eb68f1b6aefc18a1fce7d7d63d12aa62af817 /offload
parent383733ea8d15524517b0f1f15c8380c24f17407d (diff)
downloadllvm-users/shiltian/amdgpu-thinlto-summary-block.tar.gz
llvm-users/shiltian/amdgpu-thinlto-summary-block.tar.bz2
llvm-users/shiltian/amdgpu-thinlto-summary-block.zip
[RFC][AMDGPU] Add AMDGPU_SUMMARY bitcode block for ThinLTOusers/shiltian/amdgpu-thinlto-summary-block
With AMDGPU object linking, device functions are compiled separately from the kernels that call them. Without whole-program visibility, the compiler must be conservative about occupancy for every device function, leading to suboptimal resource usage. However, GPU kernels typically carry explicit occupancy control attributes that constrain the launch environment. ThinLTO is the natural place to propagate these kernel attributes to callees: the combined module summary index contains a cross-TU call graph, allowing occupancy information to be propagated top-down from kernels to all reachable device functions. The backend can then generate better code with the propagated constraints, achieving whole-program awareness without the compile-time overhead of full LTO. This patch introduces a dedicated AMDGPU_SUMMARY bitcode block that serializes per-function summary data alongside the standard module summary. The block is scoped to AMDGPU so that non-AMDGPU targets are completely unaffected. A follow-up patch will add the ThinLTO propagation logic that reads these summaries and applies conservative attribute bounds to device functions reachable from multiple kernels.
Diffstat (limited to 'offload')
0 files changed, 0 insertions, 0 deletions