diff options
| author | Shilei Tian <i@tianshilei.me> | 2026-04-28 00:33:42 -0400 |
|---|---|---|
| committer | Shilei Tian <i@tianshilei.me> | 2026-04-28 21:15:00 -0400 |
| commit | 630bff8a2248da1873f27060d17301b5a5606ebb (patch) | |
| tree | b52eb68f1b6aefc18a1fce7d7d63d12aa62af817 /offload/README.md | |
| parent | 383733ea8d15524517b0f1f15c8380c24f17407d (diff) | |
| download | llvm-users/shiltian/amdgpu-thinlto-summary-block.tar.gz llvm-users/shiltian/amdgpu-thinlto-summary-block.tar.bz2 llvm-users/shiltian/amdgpu-thinlto-summary-block.zip | |
[RFC][AMDGPU] Add AMDGPU_SUMMARY bitcode block for ThinLTOusers/shiltian/amdgpu-thinlto-summary-block
With AMDGPU object linking, device functions are compiled separately from the
kernels that call them. Without whole-program visibility, the compiler must be
conservative about occupancy for every device function, leading to suboptimal
resource usage. However, GPU kernels typically carry explicit occupancy control
attributes that constrain the launch environment. ThinLTO is the natural place
to propagate these kernel attributes to callees: the combined module summary
index contains a cross-TU call graph, allowing occupancy information to be
propagated top-down from kernels to all reachable device functions. The backend
can then generate better code with the propagated constraints, achieving
whole-program awareness without the compile-time overhead of full LTO.
This patch introduces a dedicated AMDGPU_SUMMARY bitcode block that serializes
per-function summary data alongside the standard module summary. The block is
scoped to AMDGPU so that non-AMDGPU targets are completely unaffected. A
follow-up patch will add the ThinLTO propagation logic that reads these
summaries and applies conservative attribute bounds to device functions
reachable from multiple kernels.
Diffstat (limited to 'offload/README.md')
0 files changed, 0 insertions, 0 deletions
