aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorShilei Tian <i@tianshilei.me>2026-04-15 22:27:49 -0400
committerShilei Tian <i@tianshilei.me>2026-04-24 17:46:26 -0400
commit2826b518dfbfa81b01eb8c927faa67b439760237 (patch)
tree7d5a9a950b8d081cccf9a78685317f66875f33de
parentb49855fc5684eebd47d177df1fbfbd329653bbd1 (diff)
downloadllvm-users/shiltian/amdgpu-function-info.tar.gz
llvm-users/shiltian/amdgpu-function-info.tar.bz2
llvm-users/shiltian/amdgpu-function-info.zip
[AMDGPU] Add `.amdgpu.info` section for per-function metadatausers/shiltian/amdgpu-function-info
AMDGPU object linking requires the linker to propagate resource usage (registers, stack, LDS) across translation units. To support this, the compiler must emit per-function metadata and call graph edges in the relocatable object so the linker can compute whole-program resource requirements. This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed binary format: each entry is encoded as: ``` [kind: u8] [len: u8] [payload: <len> bytes] ``` A function scope is opened by an `INFO_FUNC` entry (containing a symbol reference), followed by per-function attributes (register counts, flags, private segment size) and relational edges (direct calls, LDS uses, indirect call signatures). String data such as function type signatures is stored in a companion `.amdgpu.strtab` section. The format is forward-compatible: a consumer that encounters an unknown kind can skip it by reading the length byte, allowing new entry kinds to be added without breaking existing toolchains.
-rw-r--r--llvm/docs/AMDGPUUsage.rst106
-rw-r--r--llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h74
-rw-r--r--llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp157
-rw-r--r--llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h10
-rw-r--r--llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp3
-rw-r--r--llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp113
-rw-r--r--llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp179
-rw-r--r--llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h30
-rw-r--r--llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll23
-rw-r--r--llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll62
-rw-r--r--llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll29
-rw-r--r--llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll221
-rw-r--r--llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll47
-rw-r--r--llvm/test/MC/AMDGPU/amdgpu-info-err.s43
-rw-r--r--llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s126
15 files changed, 1209 insertions, 14 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 1f7f1f92f5e2..dca7b9accded 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -2838,6 +2838,8 @@ An AMDGPU target ELF code object has the standard ELF sections which include:
``.strtab`` ``SHT_STRTAB`` *none*
``.symtab`` ``SHT_SYMTAB`` *none*
``.text`` ``SHT_PROGBITS`` ``SHF_ALLOC`` + ``SHF_EXECINSTR``
+ ``.amdgpu.info`` ``SHT_PROGBITS`` ``SHF_EXCLUDE``
+ ``.amdgpu.strtab`` ``SHT_STRTAB`` ``SHF_EXCLUDE``
================== ================ =================================
These sections have their standard meanings (see [ELF]_) and are only generated
@@ -2873,6 +2875,67 @@ if needed.
``.amdgpu.kernel.runtime.handle``
Symbols used for device enqueue.
+.. _amdgpu-info-section:
+
+``.amdgpu.info``
+ Per-function metadata for AMDGPU object linking, emitted only in relocatable
+ code objects when object linking is enabled
+ (``-amdgpu-enable-object-linking``). The linker uses this section to
+ propagate resource usage (registers, stack, LDS) and resolve call graph
+ dependencies across translation units.
+
+ Each entry uses a tagged, length-prefixed binary encoding:
+
+ .. code-block:: none
+
+ [kind: u8] [len: u8] [payload: <len> bytes]
+
+ A function scope is opened by an ``INFO_FUNC`` entry whose payload is an
+ 8-byte relocated symbol reference. All subsequent entries until the next
+ ``INFO_FUNC`` or end of section belong to that scope. The format is
+ forward-compatible: unknown kinds can be skipped by reading the length byte.
+
+ .. table:: AMDGPU Info Entry Kinds
+ :name: amdgpu-info-entry-kinds-table
+
+ ===== ============================== ==========================================
+ Value Name Payload
+ ===== ============================== ==========================================
+ 1 ``INFO_FUNC`` 8B symbol ref; opens function scope
+ 2 ``INFO_FLAGS`` u32; ``FuncInfoFlags`` bitfield
+ 3 ``INFO_NUM_SGPR`` u32; SGPRs explicitly used
+ 4 ``INFO_NUM_VGPR`` u32; architectural VGPRs used
+ 5 ``INFO_NUM_AGPR`` u32; accumulator VGPRs (AGPRs) used
+ 6 ``INFO_PRIVATE_SEGMENT_SIZE`` u32; private (scratch) segment bytes
+ 7 ``INFO_USE`` 8B symbol ref; resource dependency edge
+ 8 ``INFO_CALL`` 8B symbol ref; direct call edge
+ 9 ``INFO_INDIRECT_CALL`` u32 strtab offset; indirect call type-ID
+ 10 ``INFO_TYPEID`` u32 strtab offset; function type-ID
+ ===== ============================== ==========================================
+
+ .. table:: AMDGPU Info Function Flags (``INFO_FLAGS``)
+ :name: amdgpu-info-flags-table
+
+ ===== =========================== ==========================================
+ Bit Name Description
+ ===== =========================== ==========================================
+ 0x1 ``FUNC_USES_VCC`` Function uses the VCC register
+ 0x2 ``FUNC_USES_FLAT_SCRATCH`` Function uses flat scratch addressing
+ 0x4 ``FUNC_HAS_DYN_STACK`` Function has dynamic stack allocation
+ ===== =========================== ==========================================
+
+ Symbol references (``INFO_FUNC``, ``INFO_USE``, ``INFO_CALL``) generate
+ ``R_AMDGPU_ABS64`` relocations in ``.rela.amdgpu.info``. String payloads
+ (``INFO_INDIRECT_CALL``, ``INFO_TYPEID``) store a ``u32`` offset into
+ the companion ``.amdgpu.strtab`` section.
+
+ See :ref:`amdgpu-assembler-directive-amdgpu-info` for the assembly syntax.
+
+``.amdgpu.strtab``
+ Null-terminated string pool for the ``.amdgpu.info`` section. Contains
+ type-ID strings referenced by ``INFO_INDIRECT_CALL`` and ``INFO_TYPEID``
+ entries. Only present when ``.amdgpu.info`` requires string data.
+
.. _amdgpu-note-records:
Note Records
@@ -21766,6 +21829,49 @@ semantics described in :ref:`amdgpu-amdhsa-code-object-metadata-v3`,
This directive is terminated by an ``.end_amdgpu_metadata`` directive.
+.. _amdgpu-assembler-directive-amdgpu-info:
+
+.amdgpu_info <symbol>
++++++++++++++++++++++
+
+Begins a per-function metadata block for ``<symbol>`` in the ``.amdgpu.info``
+section (see :ref:`amdgpu-info-section`). Only valid when the OS is ``amdhsa``.
+The block is terminated by an ``.end_amdgpu_info`` directive.
+
+The following sub-directives may appear inside the block:
+
+ .. table:: .amdgpu_info Sub-Directives
+ :name: amdgpu-info-sub-directives-table
+
+ ====================================== ==========================================
+ Directive Description
+ ====================================== ==========================================
+ ``.amdgpu_flags`` *value* ``FuncInfoFlags`` bitfield (u32)
+ ``.amdgpu_num_sgpr`` *value* SGPRs explicitly used (u32)
+ ``.amdgpu_num_vgpr`` *value* Architectural VGPRs used (u32)
+ ``.amdgpu_num_agpr`` *value* Accumulator VGPRs used (u32)
+ ``.amdgpu_private_segment_size`` *n* Private segment size in bytes (u32)
+ ``.amdgpu_use`` *symbol* Resource dependency (LDS or barrier)
+ ``.amdgpu_call`` *symbol* Direct call edge to *symbol*
+ ``.amdgpu_indirect_call`` *"type-id"* Indirect call with given type-ID string
+ ``.amdgpu_typeid`` *"type-id"* Type-ID for an address-taken function
+ ====================================== ==========================================
+
+Example:
+
+.. code-block:: nasm
+
+ .amdgpu_info my_kernel
+ .amdgpu_flags 7
+ .amdgpu_num_sgpr 33
+ .amdgpu_num_vgpr 32
+ .amdgpu_num_agpr 0
+ .amdgpu_private_segment_size 0
+ .amdgpu_use lds_var
+ .amdgpu_call helper
+ .amdgpu_indirect_call "vi"
+ .end_amdgpu_info
+
.. _amdgpu-amdhsa-assembler-example-v3-onwards:
Code Object V3 and Above Example Source Code
diff --git a/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h b/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h
new file mode 100644
index 000000000000..e65161e6545f
--- /dev/null
+++ b/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h
@@ -0,0 +1,74 @@
+//===--- AMDGPUObjLinkingInfo.h ---------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// Enums shared between the AMDGPU backend (LLVM) and the ELF linker (LLD)
+/// for the `.amdgpu.info` object-linking metadata section.
+///
+/// Binary layout of each entry: [kind: u8] [len: u8] [payload: <len> bytes].
+/// Unknown kinds are forward-compatible: a consumer skips them by reading len.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H
+#define LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H
+
+#include "llvm/ADT/BitmaskEnum.h"
+
+#include <cstdint>
+
+namespace llvm {
+namespace AMDGPU {
+
+/// Entry kind values for the `.amdgpu.info` section.
+///
+/// Entries that appear between an INFO_FUNC and the next INFO_FUNC (or end of
+/// section) belong to the function scope opened by that INFO_FUNC.
+enum class InfoKind : uint8_t {
+ /// Opens a new function scope. Payload is an 8-byte symbol reference
+ /// (relocated) identifying the function. All subsequent entries until the
+ /// next INFO_FUNC belong to this function.
+ INFO_FUNC = 1,
+ /// Bitfield of FuncInfoFlags properties for the function. [u32]
+ INFO_FLAGS = 2,
+ /// Number of SGPRs explicitly used by the function. [u32]
+ INFO_NUM_SGPR = 3,
+ /// Number of architectural VGPRs used by the function. [u32]
+ INFO_NUM_VGPR = 4,
+ /// Number of accumulator VGPRs (AGPRs) used by the function. [u32]
+ INFO_NUM_AGPR = 5,
+ /// Private (scratch) memory size in bytes required by the function. [u32]
+ INFO_PRIVATE_SEGMENT_SIZE = 6,
+ /// Dependency edge: the function uses the resource identified by the
+ /// 8-byte relocated symbol (e.g. an LDS variable or named barrier).
+ INFO_USE = 7,
+ /// Direct call edge: the function calls the callee identified by the
+ /// 8-byte relocated symbol.
+ INFO_CALL = 8,
+ /// Indirect call edge: the function contains an indirect call whose
+ /// callee is expected to match the type-ID string at the given
+ /// `.amdgpu.strtab` offset. [u32]
+ INFO_INDIRECT_CALL = 9,
+ /// Function type ID: tags an address-taken function with a type-ID
+ /// string (at the given `.amdgpu.strtab` offset) so the linker can match
+ /// it against INFO_INDIRECT_CALL entries. [u32]
+ INFO_TYPEID = 10,
+};
+
+/// Per-function flags packed into INFO_FLAGS entries.
+enum class FuncInfoFlags : uint32_t {
+ FUNC_USES_VCC = 1U << 0,
+ FUNC_USES_FLAT_SCRATCH = 1U << 1,
+ FUNC_HAS_DYN_STACK = 1U << 2,
+ LLVM_MARK_AS_BITMASK_ENUM(/*LargestValue=*/FUNC_HAS_DYN_STACK),
+};
+
+} // namespace AMDGPU
+} // namespace llvm
+
+#endif // LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 718b2b154e25..2e5e9ef0a3f5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -32,6 +32,7 @@
#include "Utils/AMDGPUBaseInfo.h"
#include "Utils/AMDKernelCodeTUtils.h"
#include "Utils/SIDefinesUtils.h"
+#include "llvm/ADT/StringSet.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/CodeGen/AsmPrinterHandler.h"
@@ -537,6 +538,133 @@ void AMDGPUAsmPrinter::validateMCResourceInfo(Function &F) {
}
}
+static void appendTypeEncoding(std::string &Enc, Type *Ty,
+ const DataLayout &DL) {
+ if (Ty->isVoidTy()) {
+ Enc += 'v';
+ return;
+ }
+ unsigned Bits = DL.getTypeSizeInBits(Ty);
+ // Zero-sized non-void types (e.g. `{}` or `[0 x i8]`) would collapse to the
+ // same encoding as i32, which would silently conflate distinct signatures.
+ // These aren't expected in indirect-callable signatures in practice.
+ assert(Bits > 0 && "Unexpected zero-sized non-void type in type-ID "
+ "encoding; encoding would be ambiguous");
+ if (Bits <= 32)
+ Enc += 'i';
+ else if (Bits <= 64)
+ Enc += 'l';
+ else
+ Enc.append(divideCeil(Bits, 32), 'i');
+}
+
+static std::string computeTypeId(const FunctionType *FTy,
+ const DataLayout &DL) {
+ std::string Enc;
+ appendTypeEncoding(Enc, FTy->getReturnType(), DL);
+ for (Type *ParamTy : FTy->params())
+ appendTypeEncoding(Enc, ParamTy, DL);
+ return Enc;
+}
+
+void AMDGPUAsmPrinter::collectCallEdge(const MachineInstr &MI) {
+ if (!AMDGPUTargetMachine::EnableObjectLinking)
+ return;
+ const SIInstrInfo *TII = MF->getSubtarget<GCNSubtarget>().getInstrInfo();
+ const MachineOperand *Callee =
+ TII->getNamedOperand(MI, AMDGPU::OpName::callee);
+ if (!Callee || !Callee->isGlobal())
+ return;
+ DirectCallEdges.insert(
+ {getSymbol(&MF->getFunction()), getSymbol(Callee->getGlobal())});
+}
+
+void AMDGPUAsmPrinter::emitAMDGPUInfo(Module &M) {
+ if (!AMDGPUTargetMachine::EnableObjectLinking)
+ return;
+
+ const NamedMDNode *LDSMD = M.getNamedMetadata("amdgpu.lds.uses");
+ bool HasLDSUses = LDSMD && LDSMD->getNumOperands() > 0;
+
+ const NamedMDNode *BarMD = M.getNamedMetadata("amdgpu.named_barrier.uses");
+ bool HasNamedBarriers = BarMD && BarMD->getNumOperands() > 0;
+
+ // Collect address-taken functions (with type IDs) and indirect call sites.
+ DenseMap<const Function *, std::string> AddrTakenTypeIds;
+ using IndirectCallInfo = std::pair<const Function *, std::string>;
+ SmallVector<IndirectCallInfo, 8> IndirectCalls;
+
+ for (const Function &F : M) {
+ bool IsKernel = AMDGPU::isKernel(F.getCallingConv());
+
+ if (!IsKernel && F.hasAddressTaken(/*PutOffender=*/nullptr,
+ /*IgnoreCallbackUses=*/false,
+ /*IgnoreAssumeLikeCalls=*/true,
+ /*IgnoreLLVMUsed=*/true)) {
+ AddrTakenTypeIds[&F] =
+ computeTypeId(F.getFunctionType(), M.getDataLayout());
+ }
+
+ if (F.isDeclaration())
+ continue;
+
+ StringSet<> SeenTypeIds;
+ for (const BasicBlock &BB : F) {
+ for (const Instruction &I : BB) {
+ const auto *CB = dyn_cast<CallBase>(&I);
+ if (!CB || !CB->isIndirectCall())
+ continue;
+ std::string TId =
+ computeTypeId(CB->getFunctionType(), M.getDataLayout());
+ if (SeenTypeIds.insert(TId).second)
+ IndirectCalls.push_back({&F, std::move(TId)});
+ }
+ }
+ }
+
+ if (FunctionInfos.empty() && DirectCallEdges.empty() && !HasLDSUses &&
+ !HasNamedBarriers && AddrTakenTypeIds.empty() && IndirectCalls.empty())
+ return;
+
+ AMDGPU::InfoSectionData Data;
+ Data.Funcs = std::move(FunctionInfos);
+
+ for (auto &[F, TypeId] : AddrTakenTypeIds) {
+ MCSymbol *Sym = getSymbol(F);
+ Data.TypeIds.push_back({Sym, TypeId});
+ }
+
+ for (auto &[CallerSym, CalleeSym] : DirectCallEdges)
+ Data.Calls.push_back({CallerSym, CalleeSym});
+ DirectCallEdges.clear();
+
+ if (HasLDSUses) {
+ for (const MDNode *N : LDSMD->operands()) {
+ auto *Func = mdconst::extract<Function>(N->getOperand(0));
+ auto *LdsVar = mdconst::extract<GlobalVariable>(N->getOperand(1));
+ Data.Uses.push_back({getSymbol(Func), getSymbol(LdsVar)});
+ }
+ }
+
+ if (HasNamedBarriers) {
+ for (const MDNode *N : BarMD->operands()) {
+ auto *BarVar = mdconst::extract<GlobalVariable>(N->getOperand(0));
+ MCSymbol *BarSym = getSymbol(BarVar);
+ for (unsigned I = 1, E = N->getNumOperands(); I < E; ++I) {
+ auto *Func = mdconst::extract<Function>(N->getOperand(I));
+ Data.Uses.push_back({getSymbol(Func), BarSym});
+ }
+ }
+ }
+
+ for (auto &[Caller, Enc] : IndirectCalls) {
+ MCSymbol *CallerSym = getSymbol(Caller);
+ Data.IndirectCalls.push_back({CallerSym, Enc});
+ }
+
+ getTargetStreamer()->emitAMDGPUInfo(Data);
+}
+
bool AMDGPUAsmPrinter::doFinalization(Module &M) {
// Pad with s_code_end to help tools and guard against instruction prefetch
// causing stale data in caches. Arguably this should be done by the linker,
@@ -553,6 +681,10 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
}
}
+ // Emit the unified .amdgpu.info section (per-function resources, call graph,
+ // LDS/named-barrier use edges, indirect calls, and address-taken type IDs).
+ emitAMDGPUInfo(M);
+
// Assign expressions which can only be resolved when all other functions are
// known.
RI.finalize(OutContext);
@@ -567,8 +699,15 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
RI.getMaxSGPRSymbol(OutContext), RI.getMaxNamedBarrierSymbol(OutContext));
OutStreamer->popSection();
- for (Function &F : M.functions())
- validateMCResourceInfo(F);
+ // In the object-linking pipeline per-function resource MCExprs reference
+ // external callee symbols that cannot be evaluated here, so cross-TU limit
+ // checks would silently no-op for every non-leaf function. Defer resource
+ // sanity checking to the linker, which re-validates against the aggregated
+ // call graph in the combined .amdgpu.info metadata.
+ if (!AMDGPUTargetMachine::EnableObjectLinking) {
+ for (Function &F : M.functions())
+ validateMCResourceInfo(F);
+ }
RI.reset();
@@ -729,6 +868,20 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
RI.gatherResourceInfo(MF, *ResourceUsage, OutContext);
+ if (AMDGPUTargetMachine::EnableObjectLinking) {
+ const AMDGPUResourceUsageAnalysisWrapperPass::FunctionResourceInfo &RU =
+ *ResourceUsage;
+ FunctionInfos.push_back(
+ {/*NumSGPR=*/static_cast<uint32_t>(RU.NumExplicitSGPR),
+ /*NumArchVGPR=*/static_cast<uint32_t>(RU.NumVGPR),
+ /*NumAccVGPR=*/static_cast<uint32_t>(RU.NumAGPR),
+ /*PrivateSegmentSize=*/static_cast<uint32_t>(RU.PrivateSegmentSize),
+ /*UsesVCC=*/RU.UsesVCC,
+ /*UsesFlatScratch=*/RU.UsesFlatScratch,
+ /*HasDynStack=*/RU.HasDynamicallySizedStack,
+ /*Sym=*/getSymbol(&MF.getFunction())});
+ }
+
if (MFI->isModuleEntryFunction()) {
getSIProgramInfo(CurrentProgramInfo, MF);
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index 31d10fe92ca2..9066b2d419f8 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -15,7 +15,10 @@
#define LLVM_LIB_TARGET_AMDGPU_AMDGPUASMPRINTER_H
#include "AMDGPUMCResourceInfo.h"
+#include "AMDGPUResourceUsageAnalysis.h"
+#include "MCTargetDesc/AMDGPUTargetStreamer.h"
#include "SIProgramInfo.h"
+#include "llvm/ADT/SetVector.h"
#include "llvm/CodeGen/AsmPrinter.h"
namespace llvm {
@@ -86,6 +89,13 @@ private:
void initTargetStreamer(Module &M);
+ void emitAMDGPUInfo(Module &M);
+ void collectCallEdge(const MachineInstr &MI);
+
+ SetVector<std::pair<MCSymbol *, MCSymbol *>> DirectCallEdges;
+
+ SmallVector<AMDGPU::FuncInfo, 8> FunctionInfos;
+
SmallString<128> getMCExprStr(const MCExpr *Value);
/// Attempts to replace the validation that is missed in getSIProgramInfo due
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
index 56592bde3b1c..3c89e3d287b3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
@@ -320,6 +320,9 @@ static void emitVGPRBlockComment(const MachineInstr *MI, const SIInstrInfo *TII,
}
void AMDGPUAsmPrinter::emitInstruction(const MachineInstr *MI) {
+ if (MI->isCall())
+ collectCallEdge(*MI);
+
// FIXME: Enable feature predicate checks once all the test pass.
// AMDGPU_MC::verifyInstructionPredicates(MI->getOpcode(),
// getSubtargetInfo().getFeatureBits());
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 2e300733b5c9..5d11e8a66c7c 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -38,6 +38,7 @@
#include "llvm/MC/MCSymbol.h"
#include "llvm/MC/TargetRegistry.h"
#include "llvm/Support/AMDGPUMetadata.h"
+#include "llvm/Support/AMDGPUObjLinkingInfo.h"
#include "llvm/Support/AMDHSAKernelDescriptor.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/Compiler.h"
@@ -1382,6 +1383,9 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
return getRegBitWidth(RCID) / 8;
}
+ AMDGPU::InfoSectionData InfoData;
+ bool HasInfoData = false;
+
private:
void createConstantSymbol(StringRef Id, int64_t Val);
@@ -1422,6 +1426,7 @@ private:
bool ParseDirectivePALMetadataBegin();
bool ParseDirectivePALMetadata();
bool ParseDirectiveAMDGPULDS();
+ bool ParseDirectiveAMDGPUInfo();
/// Common code to parse out a block of text (typically YAML) between start and
/// end directives.
@@ -1676,6 +1681,7 @@ public:
uint64_t &ErrorInfo,
bool MatchingInlineAsm) override;
bool ParseDirective(AsmToken DirectiveID) override;
+ void onEndOfFile() override;
ParseStatus parseOperand(OperandVector &Operands, StringRef Mnemonic,
OperandMode Mode = OperandMode_Default);
StringRef parseMnemonicSuffix(StringRef Name);
@@ -6741,6 +6747,110 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPULDS() {
return false;
}
+bool AMDGPUAsmParser::ParseDirectiveAMDGPUInfo() {
+ if (getParser().checkForValidSection())
+ return true;
+
+ StringRef FuncName;
+ if (getParser().parseIdentifier(FuncName))
+ return TokError("expected symbol name after .amdgpu_info");
+
+ MCSymbol *FuncSym = getContext().getOrCreateSymbol(FuncName);
+ AMDGPU::FuncInfo FI;
+ FI.Sym = FuncSym;
+ bool HasScalarAttrs = false;
+
+ while (true) {
+ while (trySkipToken(AsmToken::EndOfStatement))
+ ;
+
+ StringRef ID;
+ SMLoc IDLoc = getLoc();
+ if (!parseId(ID, "expected directive or .end_amdgpu_info"))
+ return true;
+
+ if (ID == ".end_amdgpu_info")
+ break;
+
+ // Every per-entry directive shares the `.amdgpu_` namespace prefix; strip
+ // it once and dispatch on the distinguishing suffix below. The unstripped
+ // ID is preserved for diagnostics.
+ StringRef Dir = ID;
+ if (!Dir.consume_front(".amdgpu_"))
+ return Error(IDLoc, "unknown .amdgpu_info directive '" + ID + "'");
+
+ if (Dir == "flags") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ auto Flags = static_cast<AMDGPU::FuncInfoFlags>(Val);
+ FI.UsesVCC = !!(Flags & AMDGPU::FuncInfoFlags::FUNC_USES_VCC);
+ FI.UsesFlatScratch =
+ !!(Flags & AMDGPU::FuncInfoFlags::FUNC_USES_FLAT_SCRATCH);
+ FI.HasDynStack = !!(Flags & AMDGPU::FuncInfoFlags::FUNC_HAS_DYN_STACK);
+ HasScalarAttrs = true;
+ } else if (Dir == "num_sgpr") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ FI.NumSGPR = static_cast<uint32_t>(Val);
+ HasScalarAttrs = true;
+ } else if (Dir == "num_vgpr") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ FI.NumArchVGPR = static_cast<uint32_t>(Val);
+ HasScalarAttrs = true;
+ } else if (Dir == "num_agpr") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ FI.NumAccVGPR = static_cast<uint32_t>(Val);
+ HasScalarAttrs = true;
+ } else if (Dir == "private_segment_size") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ FI.PrivateSegmentSize = static_cast<uint32_t>(Val);
+ HasScalarAttrs = true;
+ } else if (Dir == "use") {
+ StringRef ResName;
+ if (getParser().parseIdentifier(ResName))
+ return TokError("expected resource symbol for .amdgpu_use");
+ InfoData.Uses.push_back(
+ {FuncSym, getContext().getOrCreateSymbol(ResName)});
+ } else if (Dir == "call") {
+ StringRef DstName;
+ if (getParser().parseIdentifier(DstName))
+ return TokError("expected callee symbol for .amdgpu_call");
+ InfoData.Calls.push_back(
+ {FuncSym, getContext().getOrCreateSymbol(DstName)});
+ } else if (Dir == "indirect_call") {
+ std::string TypeId;
+ if (getParser().parseEscapedString(TypeId))
+ return TokError("expected type ID string for .amdgpu_indirect_call");
+ InfoData.IndirectCalls.push_back({FuncSym, std::move(TypeId)});
+ } else if (Dir == "typeid") {
+ std::string TypeId;
+ if (getParser().parseEscapedString(TypeId))
+ return TokError("expected type ID string for .amdgpu_typeid");
+ InfoData.TypeIds.push_back({FuncSym, std::move(TypeId)});
+ } else {
+ return Error(IDLoc, "unknown .amdgpu_info directive '" + ID + "'");
+ }
+ }
+
+ if (HasScalarAttrs)
+ InfoData.Funcs.push_back(std::move(FI));
+ HasInfoData = true;
+ return false;
+}
+
+void AMDGPUAsmParser::onEndOfFile() {
+ if (HasInfoData)
+ getTargetStreamer().emitAMDGPUInfo(InfoData);
+}
+
bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) {
StringRef IDVal = DirectiveID.getString();
@@ -6778,6 +6888,9 @@ bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) {
if (IDVal == ".amdgpu_lds")
return ParseDirectiveAMDGPULDS();
+ if (IDVal == ".amdgpu_info")
+ return ParseDirectiveAMDGPUInfo();
+
if (IDVal == PALMD::AssemblerDirectiveBegin)
return ParseDirectivePALMetadataBegin();
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index d276bab0ff3b..aefe11608d15 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -25,7 +25,9 @@
#include "llvm/MC/MCELFObjectWriter.h"
#include "llvm/MC/MCELFStreamer.h"
#include "llvm/MC/MCSubtargetInfo.h"
+#include "llvm/MC/StringTableBuilder.h"
#include "llvm/Support/AMDGPUMetadata.h"
+#include "llvm/Support/AMDGPUObjLinkingInfo.h"
#include "llvm/Support/AMDHSAKernelDescriptor.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/FormattedStream.h"
@@ -664,6 +666,103 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
OS << "\t.end_amdhsa_kernel\n";
}
+namespace {
+/// Callback type invoked by \c forEachInfoScope for each function scope in
+/// the canonical iteration order. The scope is emitted exactly once per
+/// unique \p Sym regardless of how many flat entries reference it.
+using InfoScopeEmitter = function_ref<void(
+ MCSymbol *Sym, const AMDGPU::FuncInfo *Info, ArrayRef<MCSymbol *> Uses,
+ ArrayRef<MCSymbol *> Calls, ArrayRef<StringRef> IndirectCallTypeIds,
+ ArrayRef<StringRef> TypeIds)>;
+
+/// Group the flat edge lists in \p Data by source function symbol and drive
+/// per-scope emission. A scope is opened for every function with attached
+/// info and for every function that appears only as an edge source; each
+/// scope is emitted exactly once. Both the asm and ELF streamers share this
+/// iteration logic and only differ in the per-scope emission callback.
+static void forEachInfoScope(const AMDGPU::InfoSectionData &Data,
+ InfoScopeEmitter Emit) {
+ DenseMap<MCSymbol *, SmallVector<MCSymbol *, 2>> FuncUses;
+ DenseMap<MCSymbol *, SmallVector<MCSymbol *, 4>> FuncCalls;
+ DenseMap<MCSymbol *, SmallVector<StringRef, 2>> FuncIndirectCalls;
+ DenseMap<MCSymbol *, SmallVector<StringRef, 1>> FuncTypeIds;
+ for (const auto &[Func, Res] : Data.Uses)
+ FuncUses[Func].push_back(Res);
+ for (const auto &[Src, Dst] : Data.Calls)
+ FuncCalls[Src].push_back(Dst);
+ for (const auto &[Func, TypeId] : Data.IndirectCalls)
+ FuncIndirectCalls[Func].push_back(TypeId);
+ for (const auto &[Sym, TypeId] : Data.TypeIds)
+ FuncTypeIds[Sym].push_back(TypeId);
+
+ DenseSet<MCSymbol *> Emitted;
+ auto EmitIfNew = [&](MCSymbol *Sym, const AMDGPU::FuncInfo *Info) {
+ if (!Emitted.insert(Sym).second)
+ return;
+ ArrayRef<MCSymbol *> Uses, Calls;
+ ArrayRef<StringRef> IndirectCallTypeIds, TypeIds;
+ if (auto It = FuncUses.find(Sym); It != FuncUses.end())
+ Uses = It->second;
+ if (auto It = FuncCalls.find(Sym); It != FuncCalls.end())
+ Calls = It->second;
+ if (auto It = FuncIndirectCalls.find(Sym); It != FuncIndirectCalls.end())
+ IndirectCallTypeIds = It->second;
+ if (auto It = FuncTypeIds.find(Sym); It != FuncTypeIds.end())
+ TypeIds = It->second;
+ Emit(Sym, Info, Uses, Calls, IndirectCallTypeIds, TypeIds);
+ };
+
+ for (const AMDGPU::FuncInfo &Func : Data.Funcs)
+ EmitIfNew(Func.Sym, &Func);
+ // Emit scopes for functions that only appear as edge sources (e.g. typeid
+ // tags on address-taken declarations, or callers of external functions).
+ for (const auto &[Sym, TypeId] : Data.TypeIds)
+ EmitIfNew(Sym, nullptr);
+ for (const auto &[Sym, Res] : Data.Uses)
+ EmitIfNew(Sym, nullptr);
+ for (const auto &[Sym, Dst] : Data.Calls)
+ EmitIfNew(Sym, nullptr);
+ for (const auto &[Sym, TypeId] : Data.IndirectCalls)
+ EmitIfNew(Sym, nullptr);
+}
+} // namespace
+
+void AMDGPUTargetAsmStreamer::emitAMDGPUInfo(
+ const AMDGPU::InfoSectionData &Data) {
+ forEachInfoScope(Data, [&](MCSymbol *Sym, const AMDGPU::FuncInfo *Info,
+ ArrayRef<MCSymbol *> Uses,
+ ArrayRef<MCSymbol *> Calls,
+ ArrayRef<StringRef> IndirectCallTypeIds,
+ ArrayRef<StringRef> TypeIds) {
+ OS << "\t.amdgpu_info " << Sym->getName() << '\n';
+ if (Info) {
+ AMDGPU::FuncInfoFlags Flags{};
+ if (Info->UsesVCC)
+ Flags |= AMDGPU::FuncInfoFlags::FUNC_USES_VCC;
+ if (Info->UsesFlatScratch)
+ Flags |= AMDGPU::FuncInfoFlags::FUNC_USES_FLAT_SCRATCH;
+ if (Info->HasDynStack)
+ Flags |= AMDGPU::FuncInfoFlags::FUNC_HAS_DYN_STACK;
+ OS << "\t\t.amdgpu_flags " << llvm::to_underlying(Flags) << '\n';
+ OS << "\t\t.amdgpu_num_sgpr " << Info->NumSGPR << '\n';
+ OS << "\t\t.amdgpu_num_vgpr " << Info->NumArchVGPR << '\n';
+ if (Info->NumAccVGPR)
+ OS << "\t\t.amdgpu_num_agpr " << Info->NumAccVGPR << '\n';
+ OS << "\t\t.amdgpu_private_segment_size " << Info->PrivateSegmentSize
+ << '\n';
+ }
+ for (MCSymbol *Res : Uses)
+ OS << "\t\t.amdgpu_use " << Res->getName() << '\n';
+ for (MCSymbol *Dst : Calls)
+ OS << "\t\t.amdgpu_call " << Dst->getName() << '\n';
+ for (StringRef TypeId : IndirectCallTypeIds)
+ OS << "\t\t.amdgpu_indirect_call \"" << TypeId << "\"\n";
+ for (StringRef TypeId : TypeIds)
+ OS << "\t\t.amdgpu_typeid \"" << TypeId << "\"\n";
+ OS << "\t.end_amdgpu_info\n\n";
+ });
+}
+
//===----------------------------------------------------------------------===//
// AMDGPUTargetELFStreamer
//===----------------------------------------------------------------------===//
@@ -1065,3 +1164,83 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
for (uint32_t i = 0; i < sizeof(amdhsa::kernel_descriptor_t::reserved3); ++i)
Streamer.emitInt8(0u);
}
+
+void AMDGPUTargetELFStreamer::emitAMDGPUInfo(
+ const AMDGPU::InfoSectionData &Data) {
+ MCELFStreamer &S = getStreamer();
+ MCContext &Context = S.getContext();
+
+ StringTableBuilder StrTab(StringTableBuilder::ELF);
+ auto getOrAddString = [&](StringRef Str) -> uint32_t {
+ if (Str.empty())
+ return UINT32_MAX;
+ return StrTab.add(Str);
+ };
+
+ auto EmitU32Entry = [&](AMDGPU::InfoKind Kind, uint32_t Val) {
+ S.emitInt8(static_cast<uint8_t>(Kind));
+ S.emitInt8(4);
+ S.emitInt32(Val);
+ };
+ auto EmitSymEntry = [&](AMDGPU::InfoKind Kind, MCSymbol *Sym) {
+ S.emitInt8(static_cast<uint8_t>(Kind));
+ S.emitInt8(8);
+ S.emitValue(MCSymbolRefExpr::create(Sym, Context), 8);
+ };
+
+ S.pushSection();
+ MCSectionELF *InfoSec = Context.getELFSection(
+ ".amdgpu.info", ELF::SHT_PROGBITS, ELF::SHF_EXCLUDE);
+ S.switchSection(InfoSec);
+
+ forEachInfoScope(Data, [&](MCSymbol *Sym, const AMDGPU::FuncInfo *Info,
+ ArrayRef<MCSymbol *> Uses,
+ ArrayRef<MCSymbol *> Calls,
+ ArrayRef<StringRef> IndirectCallTypeIds,
+ ArrayRef<StringRef> TypeIds) {
+ EmitSymEntry(AMDGPU::InfoKind::INFO_FUNC, Sym);
+
+ if (Info) {
+ AMDGPU::FuncInfoFlags Flags{};
+ if (Info->UsesVCC)
+ Flags |= AMDGPU::FuncInfoFlags::FUNC_USES_VCC;
+ if (Info->UsesFlatScratch)
+ Flags |= AMDGPU::FuncInfoFlags::FUNC_USES_FLAT_SCRATCH;
+ if (Info->HasDynStack)
+ Flags |= AMDGPU::FuncInfoFlags::FUNC_HAS_DYN_STACK;
+ EmitU32Entry(AMDGPU::InfoKind::INFO_FLAGS, llvm::to_underlying(Flags));
+ EmitU32Entry(AMDGPU::InfoKind::INFO_NUM_SGPR, Info->NumSGPR);
+ EmitU32Entry(AMDGPU::InfoKind::INFO_NUM_VGPR, Info->NumArchVGPR);
+ // INFO_NUM_AGPR is only emitted when the function actually uses AGPRs,
+ // since AGPRs are not available on all architectures.
+ if (Info->NumAccVGPR)
+ EmitU32Entry(AMDGPU::InfoKind::INFO_NUM_AGPR, Info->NumAccVGPR);
+ EmitU32Entry(AMDGPU::InfoKind::INFO_PRIVATE_SEGMENT_SIZE,
+ Info->PrivateSegmentSize);
+ }
+
+ for (MCSymbol *Res : Uses)
+ EmitSymEntry(AMDGPU::InfoKind::INFO_USE, Res);
+ for (MCSymbol *Dst : Calls)
+ EmitSymEntry(AMDGPU::InfoKind::INFO_CALL, Dst);
+ for (StringRef TypeId : IndirectCallTypeIds) {
+ EmitU32Entry(AMDGPU::InfoKind::INFO_INDIRECT_CALL,
+ getOrAddString(TypeId));
+ }
+ for (StringRef TypeId : TypeIds)
+ EmitU32Entry(AMDGPU::InfoKind::INFO_TYPEID, getOrAddString(TypeId));
+ });
+
+ if (!StrTab.empty()) {
+ StrTab.finalizeInOrder();
+ MCSectionELF *Sec = Context.getELFSection(".amdgpu.strtab", ELF::SHT_STRTAB,
+ ELF::SHF_EXCLUDE);
+ S.switchSection(Sec);
+ SmallString<128> Buf;
+ raw_svector_ostream OS(Buf);
+ StrTab.write(OS);
+ S.emitBytes(Buf);
+ }
+
+ S.popSection();
+}
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
index 3a0d8dcd2d27..ca1fe3ccf3da 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
@@ -11,7 +11,10 @@
#include "Utils/AMDGPUBaseInfo.h"
#include "Utils/AMDGPUPALMetadata.h"
+#include "llvm/ADT/SmallVector.h"
#include "llvm/MC/MCStreamer.h"
+#include <string>
+#include <utility>
namespace llvm {
@@ -26,6 +29,27 @@ struct MCKernelDescriptor;
namespace HSAMD {
struct Metadata;
}
+
+struct FuncInfo {
+ uint32_t NumSGPR = 0;
+ uint32_t NumArchVGPR = 0;
+ uint32_t NumAccVGPR = 0;
+ uint32_t PrivateSegmentSize = 0;
+ bool UsesVCC = false;
+ bool UsesFlatScratch = false;
+ bool HasDynStack = false;
+
+ MCSymbol *Sym = nullptr;
+};
+
+struct InfoSectionData {
+ SmallVector<FuncInfo, 8> Funcs;
+ SmallVector<std::pair<MCSymbol *, MCSymbol *>, 4> Uses;
+ SmallVector<std::pair<MCSymbol *, MCSymbol *>, 8> Calls;
+ SmallVector<std::pair<MCSymbol *, std::string>, 4> IndirectCalls;
+ SmallVector<std::pair<MCSymbol *, std::string>, 4> TypeIds;
+};
+
} // namespace AMDGPU
class AMDGPUTargetStreamer : public MCTargetStreamer {
@@ -104,6 +128,8 @@ public:
const MCExpr *ReserveVCC,
const MCExpr *ReserveFlatScr) {}
+ virtual void emitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) {}
+
static StringRef getArchNameFromElfMach(unsigned ElfMach);
static unsigned getElfMach(StringRef GPU);
@@ -168,6 +194,8 @@ public:
const MCExpr *NextVGPR, const MCExpr *NextSGPR,
const MCExpr *ReserveVCC,
const MCExpr *ReserveFlatScr) override;
+
+ void emitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) override;
};
class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
@@ -221,6 +249,8 @@ public:
const MCExpr *NextVGPR, const MCExpr *NextSGPR,
const MCExpr *ReserveVCC,
const MCExpr *ReserveFlatScr) override;
+
+ void emitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) override;
};
}
#endif
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll
new file mode 100644
index 000000000000..6442d6f6501c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll
@@ -0,0 +1,23 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-enable-object-linking < %s | FileCheck %s
+
+; Verify that .amdgpu_num_agpr IS emitted when AGPRs are used on a target
+; that supports them (gfx908 has a separate AGPR file).
+
+declare <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float, float, <4 x float>, i32, i32, i32)
+
+define void @func_with_agpr(float %a, float %b, ptr addrspace(1) %out) {
+ %result = call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float %a, float %b, <4 x float> zeroinitializer, i32 0, i32 0, i32 0)
+ store <4 x float> %result, ptr addrspace(1) %out
+ ret void
+}
+
+define amdgpu_kernel void @kern(float %a, float %b, ptr addrspace(1) %out) {
+ call void @func_with_agpr(float %a, float %b, ptr addrspace(1) %out)
+ ret void
+}
+
+; CHECK: .amdgpu_info func_with_agpr
+; CHECK: .amdgpu_num_agpr {{[1-9][0-9]*}}
+; CHECK: .end_amdgpu_info
+; CHECK: .amdgpu_info kern
+; CHECK: .end_amdgpu_info
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll
new file mode 100644
index 000000000000..0297a2a6e049
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll
@@ -0,0 +1,62 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --sections - | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=asm < %s | FileCheck %s --check-prefix=ASM --implicit-check-not=.amdgpu_num_agpr
+
+; Test that the unified .amdgpu.info section (.amdgpu_info blocks in assembly) is
+; emitted with correct relocations when object linking is enabled.
+
+declare void @extern_func()
+declare void @tail_extern()
+
+; The .amdgpu.info section should exist as SHT_PROGBITS with SHF_EXCLUDE.
+; CHECK: Section {
+; CHECK: Name: .amdgpu.info
+; CHECK: Type: SHT_PROGBITS
+; CHECK: Flags [
+; CHECK: SHF_EXCLUDE
+; CHECK: ]
+
+; Symbol references in the binary resource metadata still use R_AMDGPU_ABS64 relocations.
+; CHECK-DAG: R_AMDGPU_ABS64 my_kernel
+; CHECK-DAG: R_AMDGPU_ABS64 helper
+; CHECK-DAG: R_AMDGPU_ABS64 extern_func
+; COM: Tail-call callee must still be recorded as an INFO_CALL edge.
+; CHECK-DAG: R_AMDGPU_ABS64 tail_helper
+; CHECK-DAG: R_AMDGPU_ABS64 tail_extern
+
+; COM: Assembly: per-function .amdgpu_info blocks (target flags derived from
+; COM: e_flags).
+; ASM-DAG: .amdgpu_info helper
+; ASM-DAG: .amdgpu_flags {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_call extern_func
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info my_kernel
+; ASM-DAG: .amdgpu_flags {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_call helper
+; ASM-DAG: .end_amdgpu_info
+
+; COM: A tail call is lowered to SI_TCRETURN (isCall = 1). Verify that the
+; COM: callee edge is still captured in the .amdgpu_info block of the caller.
+; ASM-DAG: .amdgpu_info tail_helper
+; ASM-DAG: .amdgpu_call tail_extern
+; ASM-DAG: .end_amdgpu_info
+
+define void @helper() {
+ call void @extern_func()
+ ret void
+}
+
+define amdgpu_kernel void @my_kernel() {
+ call void @helper()
+ ret void
+}
+
+define void @tail_helper() {
+ tail call void @tail_extern()
+ ret void
+}
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll
index 46d4c8db00f0..02740e3bb0a1 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll
@@ -1,9 +1,11 @@
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking < %s | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking < %s | FileCheck %s --implicit-check-not=.amdgpu_num_agpr
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --sections - | FileCheck %s --check-prefix=ELF
; Verify object linking codegen for named barriers on GFX1250:
; 1. Barrier instructions use M0-based forms with relocation references
-; 2. group_segment_fixed_size = 0 (linker patches it)
-; 3. Named barrier is emitted as an SHN_AMDGPU_LDS symbol (.amdgpu_lds)
+; 2. .amdgpu.info section records the barrier as an LDS use edge
+; 3. group_segment_fixed_size = 0 (linker patches it)
+; 4. Named barrier is emitted as an SHN_AMDGPU_LDS symbol (.amdgpu_lds)
@bar = internal addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] poison
@@ -13,12 +15,29 @@
; CHECK: s_barrier_join m0
; CHECK: s_barrier_wait 1
-; KD: group_segment_fixed_size = 0 (linker will patch).
; CHECK: .amdhsa_group_segment_fixed_size 0
-; LDS symbol declaration
+; CHECK: .amdgpu_info kernel
+; CHECK: .amdgpu_flags {{[0-9]+}}
+; CHECK: .amdgpu_num_sgpr {{[0-9]+}}
+; CHECK: .amdgpu_num_vgpr {{[0-9]+}}
+; CHECK: .amdgpu_private_segment_size {{[0-9]+}}
+; CHECK: .amdgpu_use __amdgpu_named_barrier.bar{{[^ ,]*}}
+; CHECK: .amdgpu_call helper
+; CHECK: .end_amdgpu_info
+
; CHECK: .amdgpu_lds __amdgpu_named_barrier.bar{{[^ ,]*}}, 32, 4
+; ELF: Section {
+; ELF: Name: .amdgpu.info
+; ELF: Type: SHT_PROGBITS
+; ELF: Flags [
+; ELF: SHF_EXCLUDE
+
+; ELF-DAG: R_AMDGPU_ABS64 kernel
+; ELF-DAG: R_AMDGPU_ABS64 __amdgpu_named_barrier.bar{{[^ ]*}}
+; ELF-DAG: R_AMDGPU_ABS64 helper
+
define amdgpu_kernel void @kernel() {
call void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) @bar, i32 3)
call void @llvm.amdgcn.s.barrier.join(ptr addrspace(3) @bar)
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll
new file mode 100644
index 000000000000..afaf7cd19940
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll
@@ -0,0 +1,221 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r - | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=asm < %s | FileCheck %s --check-prefix=ASM --implicit-check-not=.amdgpu_num_agpr
+
+; Test ABI register-size type ID generation for various function types.
+; The type ID encodes each parameter/return by bit width: v=void, i=<=32-bit,
+; l=33-64-bit, and >64-bit types widen to ceil(bits/32) x "i". Types with the
+; same register footprint share an encoding (e.g. float(float) and i32(i32)
+; both produce "ii"). Coverage here spans scalars, vectors (whose size is the
+; total element bit width), pointers across address spaces (AS 1 is 64-bit on
+; amdhsa; AS 3 / AS 5 are 32-bit), and small integer types (i1/i8/i16) that
+; are not natively passed as function arguments but ABI-promoted to i32 slots
+; -- they still encode as "i", matching an i32 parameter.
+;
+; Cross-TU coverage: an address-taken declaration (defined in another TU) still
+; gets an .amdgpu_info scope with .amdgpu_typeid, but no per-function resource
+; counts since its body isn't available here.
+declare void @extern_decl(i32)
+
+define void @void_void() {
+ ret void
+}
+
+define i32 @i32_i32(i32 %x) {
+ ret i32 %x
+}
+
+define void @void_ptr_i32(ptr %p, i32 %x) {
+ ret void
+}
+
+define i64 @i64_i64_i64(i64 %a, i64 %b) {
+ ret i64 %a
+}
+
+define float @float_float(float %x) {
+ ret float %x
+}
+
+; Address-space pointer widths: AS 1 (global) is 64-bit -> "l"; AS 3 (LDS) and
+; AS 5 (private) are 32-bit -> "i".
+define void @ptr_addrspaces(ptr addrspace(1) %g, ptr addrspace(3) %l, ptr addrspace(5) %p) {
+ ret void
+}
+
+; Vector types: encoded by total bit width. <2 x i32> = 64 bits -> "l";
+; <4 x i32>/<4 x float>/<2 x i64> = 128 bits -> "iiii" each.
+define <4 x i32> @vectors(<2 x i32> %a, <4 x float> %b, <2 x i64> %c) {
+ ret <4 x i32> zeroinitializer
+}
+
+; Small integer types (i1/i8/i16) are ABI-promoted to i32 register slots on
+; AMDGPU. They all collapse to "i" under the bit-width scheme, matching an i32
+; parameter, so callers declared as void(i32, ...) remain compatible with
+; callees taking void(i8, ...). signext/zeroext attributes describe the
+; promotion mode and do not affect the encoding.
+define void @promoted_small_ints(i8 signext %a, i16 zeroext %b, i1 %c) {
+ ret void
+}
+
+; Wider non-vector scalars: double is "l" (64 bits); i128 widens to 4 x "i"
+; (ceil(128/32)).
+define double @wide_scalars(double %a, i128 %b) {
+ ret double %a
+}
+
+%Struct16 = type { i32, i32, i32, i32 }
+
+; byval / byref struct pointer parameters encode as a single pointer register
+; slot, the same as a plain pointer in the same address space. byval describes
+; a caller-side stack copy and byref describes a pointer handed through
+; unchanged; neither changes the callee's register footprint (one pointer),
+; so both collapse to "i" (for 32-bit AS) or "l" (for 64-bit AS). Compare
+; with the plain-pointer encodings in @ptr_addrspaces.
+define void @byval_struct_private(ptr addrspace(5) byval(%Struct16) %p) {
+ ret void
+}
+
+define void @byref_struct_constant(ptr addrspace(4) byref(%Struct16) %p) {
+ ret void
+}
+
+; Indirect-call type IDs are derived from the call instruction's FunctionType
+; using the same rules, so they match the .amdgpu_typeid of an ABI-compatible
+; address-taken callee. Duplicate signatures within one function are
+; deduplicated (the second void() call below shares "v" with the first and
+; yields only one .amdgpu_indirect_call entry; likewise a plain
+; ptr addrspace(5) call and a ptr addrspace(5) byval(...) call both encode as
+; "vi" and collapse to one entry).
+define void @icaller(ptr %f_void, ptr %f_ptrs, ptr %f_vec, ptr %f_small, ptr %f_wide, ptr %f_dup, ptr %f_priv, ptr %f_priv_byval, ptr %f_const, ptr %f_const_byref) {
+ call void %f_void()
+ call void %f_ptrs(ptr addrspace(1) null, ptr addrspace(3) null, ptr addrspace(5) null)
+ %v = call <4 x i32> %f_vec(<2 x i32> zeroinitializer, <4 x float> zeroinitializer, <2 x i64> zeroinitializer)
+ call void %f_small(i8 signext 0, i16 zeroext 0, i1 false)
+ %d = call double %f_wide(double 0.0, i128 0)
+ call void %f_dup()
+ call void %f_priv(ptr addrspace(5) null)
+ call void %f_priv_byval(ptr addrspace(5) byval(%Struct16) null)
+ call void %f_const(ptr addrspace(4) null)
+ call void %f_const_byref(ptr addrspace(4) byref(%Struct16) null)
+ ret void
+}
+
+; Take the address of each function so they appear as resource nodes.
+define void @taker() {
+ %p0 = alloca ptr, addrspace(5)
+ store volatile ptr @void_void, ptr addrspace(5) %p0
+ store volatile ptr @i32_i32, ptr addrspace(5) %p0
+ store volatile ptr @void_ptr_i32, ptr addrspace(5) %p0
+ store volatile ptr @i64_i64_i64, ptr addrspace(5) %p0
+ store volatile ptr @float_float, ptr addrspace(5) %p0
+ store volatile ptr @ptr_addrspaces, ptr addrspace(5) %p0
+ store volatile ptr @vectors, ptr addrspace(5) %p0
+ store volatile ptr @promoted_small_ints, ptr addrspace(5) %p0
+ store volatile ptr @wide_scalars, ptr addrspace(5) %p0
+ store volatile ptr @byval_struct_private, ptr addrspace(5) %p0
+ store volatile ptr @byref_struct_constant, ptr addrspace(5) %p0
+ store volatile ptr @extern_decl, ptr addrspace(5) %p0
+ ret void
+}
+
+define amdgpu_kernel void @kern() {
+ call void @taker()
+ call void @icaller(ptr @void_void, ptr @ptr_addrspaces, ptr @vectors,
+ ptr @promoted_small_ints, ptr @wide_scalars,
+ ptr @void_void,
+ ptr @byval_struct_private, ptr @byval_struct_private,
+ ptr @byref_struct_constant, ptr @byref_struct_constant)
+ ret void
+}
+
+; CHECK-DAG: R_AMDGPU_ABS64 void_void
+; CHECK-DAG: R_AMDGPU_ABS64 i32_i32
+; CHECK-DAG: R_AMDGPU_ABS64 void_ptr_i32
+; CHECK-DAG: R_AMDGPU_ABS64 i64_i64_i64
+; CHECK-DAG: R_AMDGPU_ABS64 float_float
+; CHECK-DAG: R_AMDGPU_ABS64 ptr_addrspaces
+; CHECK-DAG: R_AMDGPU_ABS64 vectors
+; CHECK-DAG: R_AMDGPU_ABS64 promoted_small_ints
+; CHECK-DAG: R_AMDGPU_ABS64 wide_scalars
+; CHECK-DAG: R_AMDGPU_ABS64 byval_struct_private
+; CHECK-DAG: R_AMDGPU_ABS64 byref_struct_constant
+; CHECK-DAG: R_AMDGPU_ABS64 extern_decl
+; CHECK-DAG: R_AMDGPU_ABS64 icaller
+; CHECK-DAG: R_AMDGPU_ABS64 taker
+; CHECK-DAG: R_AMDGPU_ABS64 kern
+
+; ASM-DAG: .amdgpu_info void_void
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "v"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info i32_i32
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "ii"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info void_ptr_i32
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "vli"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info i64_i64_i64
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "lll"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info float_float
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "ii"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info ptr_addrspaces
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "vlii"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info vectors
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "iiiiliiiiiiii"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info promoted_small_ints
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "viii"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info wide_scalars
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "lliiii"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info byval_struct_private
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "vi"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info byref_struct_constant
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_typeid "vl"
+; ASM-DAG: .end_amdgpu_info
+; COM: Address-taken declaration: only the type-ID appears in its scope, with
+; COM: no per-function resource counts (the body lives in another TU).
+; ASM-DAG: .amdgpu_info extern_decl
+; ASM-DAG: .amdgpu_typeid "vi"
+; ASM-DAG: .end_amdgpu_info
+; COM: @icaller's indirect call type IDs mirror the @void_void / @ptr_addrspaces /
+; COM: @vectors / @promoted_small_ints / @wide_scalars .amdgpu_typeid encodings
+; COM: above, proving ABI compatibility. The duplicate void() indirect call is
+; COM: deduplicated, so "v" appears only once. The byval/byref pairs dedupe
+; COM: with their plain-pointer counterparts: "vi" (AS 5) and "vl" (AS 4) each
+; COM: appear once despite two call sites apiece.
+; ASM-DAG: .amdgpu_info icaller
+; ASM-DAG: .amdgpu_flags 1
+; ASM-DAG: .amdgpu_indirect_call "v"
+; ASM-DAG: .amdgpu_indirect_call "vlii"
+; ASM-DAG: .amdgpu_indirect_call "iiiiliiiiiiii"
+; ASM-DAG: .amdgpu_indirect_call "viii"
+; ASM-DAG: .amdgpu_indirect_call "lliiii"
+; ASM-DAG: .amdgpu_indirect_call "vi"
+; ASM-DAG: .amdgpu_indirect_call "vl"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info taker
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .end_amdgpu_info
+; COM: The kernel scope is present but carries no type IDs of its own (kernels
+; COM: aren't indirect-call targets). Direct-call edges from the kernel body are
+; COM: exercised separately in the callgraph test.
+; ASM-DAG: .amdgpu_info kern
+; ASM-DAG: .amdgpu_flags {{[0-9]+}}
+; ASM-DAG: .end_amdgpu_info
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll
index 878f3abf7ccf..0020c2272d23 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll
@@ -1,17 +1,18 @@
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking < %s | FileCheck -check-prefixes=ASM %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --syms - | FileCheck -check-prefixes=ELF %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking < %s | FileCheck -check-prefixes=ASM %s --implicit-check-not=.amdgpu_num_agpr
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --syms --sections - | FileCheck -check-prefixes=ELF %s
; Test that with object linking enabled, external LDS declarations produce
-; @abs32@lo relocations, SHN_AMDGPU_LDS symbols, and .amdgpu_lds directives.
-; Covers multiple LDS variables with different sizes and alignments (including
-; zero-sized dynamic LDS), usage from both kernels and device functions, and
+; @abs32@lo relocations, SHN_AMDGPU_LDS symbols, .amdgpu_lds directives,
+; and .amdgpu_use edges in the .amdgpu.info section. Covers multiple LDS
+; variables with different sizes and alignments (including zero-sized dynamic
+; LDS), usage from both kernels and device functions, and
; group_segment_fixed_size = 0 (linker patches via binary patching).
@lds_large = external addrspace(3) global [256 x i8], align 16
@lds_small = external addrspace(3) global [128 x i8], align 4
@lds_dynamic = external addrspace(3) global [0 x i8], align 8
-; --- Assembly checks ---
+; Instruction-level relocation checks.
; ASM-LABEL: {{^}}device_func:
; ASM: v_add_u32_e32 v{{[0-9]+}}, lds_large@abs32@lo, v{{[0-9]+}}
@@ -19,17 +20,49 @@
; ASM-DAG: s_add_i32 s{{[0-9]+}}, s{{[0-9]+}}, lds_small@abs32@lo
; ASM-DAG: s_add_i32 s{{[0-9]+}}, s{{[0-9]+}}, lds_dynamic@abs32@lo
+; .amdgpu.info section with LDS use edges.
+; ASM-DAG: .amdgpu_info device_func
+; ASM-DAG: .amdgpu_flags {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_use lds_large
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info test_kernel
+; ASM-DAG: .amdgpu_flags {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_use lds_dynamic
+; ASM-DAG: .amdgpu_use lds_small
+; ASM-DAG: .amdgpu_call device_func
+; ASM-DAG: .end_amdgpu_info
+
+; SHN_AMDGPU_LDS directives.
; ASM-DAG: .amdgpu_lds lds_large, 256, 16
; ASM-DAG: .amdgpu_lds lds_small, 128, 4
; ASM-DAG: .amdgpu_lds lds_dynamic, 0, 8
; ASM: .group_segment_fixed_size: 0
-; --- ELF checks ---
+; .amdgpu.info section exists.
+; ELF: Section {
+; ELF: Name: .amdgpu.info
+; ELF: Type: SHT_PROGBITS
+; ELF: Flags [
+; ELF: SHF_EXCLUDE
+
+; Relocations.
; ELF-DAG: R_AMDGPU_ABS32_LO lds_large
; ELF-DAG: R_AMDGPU_ABS32_LO lds_small
; ELF-DAG: R_AMDGPU_ABS32_LO lds_dynamic
+; ELF-DAG: R_AMDGPU_ABS64 device_func
+; ELF-DAG: R_AMDGPU_ABS64 test_kernel
+; ELF-DAG: R_AMDGPU_ABS64 lds_large
+; ELF-DAG: R_AMDGPU_ABS64 lds_small
+; ELF-DAG: R_AMDGPU_ABS64 lds_dynamic
+; SHN_AMDGPU_LDS symbols.
; ELF-DAG: Name: lds_large
; ELF-DAG: Name: lds_small
; ELF-DAG: Name: lds_dynamic
diff --git a/llvm/test/MC/AMDGPU/amdgpu-info-err.s b/llvm/test/MC/AMDGPU/amdgpu-info-err.s
new file mode 100644
index 000000000000..22e6d2e29f47
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/amdgpu-info-err.s
@@ -0,0 +1,43 @@
+// RUN: not llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx900 %s -filetype=null 2>&1 | FileCheck %s
+
+// Each error case aborts parsing of its enclosing .amdgpu_info block: the
+// parser returns on the failing directive, which implicitly exits the block
+// (there is no block-open state tracked at the top level), and the next
+// test case starts fresh at top level. `.end_amdgpu_info` terminators are
+// therefore intentionally omitted -- adding them here would themselves
+// become "unknown directive" errors, since `.end_amdgpu_info` is only
+// recognised inside the block.
+
+// Missing function symbol after .amdgpu_info.
+.amdgpu_info
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected symbol name after .amdgpu_info
+
+// Unknown directive inside a .amdgpu_info block.
+.amdgpu_info f_unknown_dir
+ .amdgpu_bogus 1
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: unknown .amdgpu_info directive '.amdgpu_bogus'
+
+// .amdgpu_use with no resource symbol.
+.amdgpu_info f_use_missing
+ .amdgpu_use
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected resource symbol for .amdgpu_use
+
+// .amdgpu_call with no callee symbol.
+.amdgpu_info f_call_missing
+ .amdgpu_call
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected callee symbol for .amdgpu_call
+
+// .amdgpu_indirect_call with no type-ID string.
+.amdgpu_info f_icall_missing
+ .amdgpu_indirect_call
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected type ID string for .amdgpu_indirect_call
+
+// .amdgpu_typeid with no type-ID string.
+.amdgpu_info f_typeid_missing
+ .amdgpu_typeid
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected type ID string for .amdgpu_typeid
+
+// Non-identifier token where a directive or .end_amdgpu_info is expected.
+.amdgpu_info f_bad_token
+ 123
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected directive or .end_amdgpu_info
diff --git a/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s b/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s
new file mode 100644
index 000000000000..d49890eb0517
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s
@@ -0,0 +1,126 @@
+// RUN: llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx900 -filetype=asm %s | FileCheck --check-prefix=ASM %s
+// RUN: llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx900 -filetype=obj %s | llvm-readobj -r --sections --section-data --string-dump=.amdgpu.strtab - | FileCheck --check-prefix=OBJ %s
+
+// Test that .amdgpu_info directives round-trip through the assembler (asm and
+// object emission) and produce the correct TLV-encoded .amdgpu.info section.
+
+ .text
+ .globl my_kernel
+ .p2align 8
+ .type my_kernel,@function
+my_kernel:
+ s_endpgm
+.Lfunc_end0:
+ .size my_kernel, .Lfunc_end0-my_kernel
+
+ .globl helper
+ .p2align 2
+ .type helper,@function
+helper:
+ s_setpc_b64 s[30:31]
+.Lfunc_end1:
+ .size helper, .Lfunc_end1-helper
+
+ .globl addr_taken_func
+ .p2align 2
+ .type addr_taken_func,@function
+addr_taken_func:
+ s_setpc_b64 s[30:31]
+.Lfunc_end2:
+ .size addr_taken_func, .Lfunc_end2-addr_taken_func
+
+ .globl extern_func
+
+// COM: Kernel: flags=7 (KERNEL|VCC|FLAT_SCRATCH), resources, call edge, use
+// COM: edge, indirect call, and type ID. Non-zero AGPR to verify conditional
+// COM: emission.
+ .amdgpu_info my_kernel
+ .amdgpu_flags 7
+ .amdgpu_num_sgpr 33
+ .amdgpu_num_vgpr 32
+ .amdgpu_num_agpr 4
+ .amdgpu_private_segment_size 0
+ .amdgpu_use lds_var
+ .amdgpu_call helper
+ .amdgpu_indirect_call "vi"
+ .end_amdgpu_info
+
+// COM: Device function: flags=2 (VCC), call edge to external. Zero AGPR values
+// COM: are omitted from the input; the parser defaults them to 0 and the
+// COM: emitter skips them.
+ .amdgpu_info helper
+ .amdgpu_flags 2
+ .amdgpu_num_sgpr 8
+ .amdgpu_num_vgpr 10
+ .amdgpu_private_segment_size 16
+ .amdgpu_call extern_func
+ .end_amdgpu_info
+
+// Address-taken function with type ID. Zero AGPR omitted.
+ .amdgpu_info addr_taken_func
+ .amdgpu_flags 0
+ .amdgpu_num_sgpr 2
+ .amdgpu_num_vgpr 4
+ .amdgpu_private_segment_size 0
+ .amdgpu_typeid "vi"
+ .end_amdgpu_info
+
+// ASM: .amdgpu_info my_kernel
+// ASM: .amdgpu_flags 7
+// ASM: .amdgpu_num_sgpr 33
+// ASM: .amdgpu_num_vgpr 32
+// ASM: .amdgpu_num_agpr 4
+// ASM: .amdgpu_private_segment_size 0
+// ASM: .amdgpu_use lds_var
+// ASM: .amdgpu_call helper
+// ASM: .amdgpu_indirect_call "vi"
+// ASM: .end_amdgpu_info
+
+// ASM: .amdgpu_info helper
+// ASM: .amdgpu_flags 2
+// ASM: .amdgpu_num_sgpr 8
+// ASM: .amdgpu_num_vgpr 10
+// ASM-NOT: .amdgpu_num_agpr
+// ASM: .amdgpu_private_segment_size 16
+// ASM: .amdgpu_call extern_func
+// ASM: .end_amdgpu_info
+
+// ASM: .amdgpu_info addr_taken_func
+// ASM: .amdgpu_flags 0
+// ASM: .amdgpu_num_sgpr 2
+// ASM: .amdgpu_num_vgpr 4
+// ASM-NOT: .amdgpu_num_agpr
+// ASM: .amdgpu_private_segment_size 0
+// ASM: .amdgpu_typeid "vi"
+// ASM: .end_amdgpu_info
+
+// OBJ: Section {
+// OBJ: Name: .amdgpu.info
+// OBJ: Type: SHT_PROGBITS
+// OBJ: Flags [
+// OBJ: SHF_EXCLUDE
+// OBJ: ]
+// OBJ: }
+
+// The string pool backs INFO_INDIRECT_CALL / INFO_TYPEID payloads. It is an
+// ELF-convention SHT_STRTAB with a leading null byte at offset 0 and string
+// deduplication -- both directives above reference the same "vi" TypeID, so
+// it must appear exactly once starting at offset 1.
+// OBJ: Section {
+// OBJ: Name: .amdgpu.strtab
+// OBJ: Type: SHT_STRTAB
+// OBJ: Flags [
+// OBJ: SHF_EXCLUDE
+// OBJ: ]
+// OBJ: }
+
+// Relocations in .amdgpu.info should reference defined and external symbols.
+// OBJ-DAG: R_AMDGPU_ABS64 my_kernel
+// OBJ-DAG: R_AMDGPU_ABS64 helper
+// OBJ-DAG: R_AMDGPU_ABS64 addr_taken_func
+// OBJ-DAG: R_AMDGPU_ABS64 extern_func
+// OBJ-DAG: R_AMDGPU_ABS64 lds_var
+
+// OBJ: String dump of section '.amdgpu.strtab':
+// OBJ-NEXT: [{{ +}}1] vi
+// OBJ-NOT: ] vi