diff options
| author | Shilei Tian <i@tianshilei.me> | 2026-04-15 22:27:49 -0400 |
|---|---|---|
| committer | Shilei Tian <i@tianshilei.me> | 2026-04-24 17:46:26 -0400 |
| commit | 2826b518dfbfa81b01eb8c927faa67b439760237 (patch) | |
| tree | 7d5a9a950b8d081cccf9a78685317f66875f33de | |
| parent | b49855fc5684eebd47d177df1fbfbd329653bbd1 (diff) | |
| download | llvm-users/shiltian/amdgpu-function-info.tar.gz llvm-users/shiltian/amdgpu-function-info.tar.bz2 llvm-users/shiltian/amdgpu-function-info.zip | |
[AMDGPU] Add `.amdgpu.info` section for per-function metadatausers/shiltian/amdgpu-function-info
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
The format is forward-compatible: a consumer that encounters an unknown kind can
skip it by reading the length byte, allowing new entry kinds to be added without
breaking existing toolchains.
| -rw-r--r-- | llvm/docs/AMDGPUUsage.rst | 106 | ||||
| -rw-r--r-- | llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h | 74 | ||||
| -rw-r--r-- | llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 157 | ||||
| -rw-r--r-- | llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h | 10 | ||||
| -rw-r--r-- | llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp | 3 | ||||
| -rw-r--r-- | llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp | 113 | ||||
| -rw-r--r-- | llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp | 179 | ||||
| -rw-r--r-- | llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h | 30 | ||||
| -rw-r--r-- | llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll | 23 | ||||
| -rw-r--r-- | llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll | 62 | ||||
| -rw-r--r-- | llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll | 29 | ||||
| -rw-r--r-- | llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll | 221 | ||||
| -rw-r--r-- | llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll | 47 | ||||
| -rw-r--r-- | llvm/test/MC/AMDGPU/amdgpu-info-err.s | 43 | ||||
| -rw-r--r-- | llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s | 126 |
15 files changed, 1209 insertions, 14 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index 1f7f1f92f5e2..dca7b9accded 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -2838,6 +2838,8 @@ An AMDGPU target ELF code object has the standard ELF sections which include: ``.strtab`` ``SHT_STRTAB`` *none* ``.symtab`` ``SHT_SYMTAB`` *none* ``.text`` ``SHT_PROGBITS`` ``SHF_ALLOC`` + ``SHF_EXECINSTR`` + ``.amdgpu.info`` ``SHT_PROGBITS`` ``SHF_EXCLUDE`` + ``.amdgpu.strtab`` ``SHT_STRTAB`` ``SHF_EXCLUDE`` ================== ================ ================================= These sections have their standard meanings (see [ELF]_) and are only generated @@ -2873,6 +2875,67 @@ if needed. ``.amdgpu.kernel.runtime.handle`` Symbols used for device enqueue. +.. _amdgpu-info-section: + +``.amdgpu.info`` + Per-function metadata for AMDGPU object linking, emitted only in relocatable + code objects when object linking is enabled + (``-amdgpu-enable-object-linking``). The linker uses this section to + propagate resource usage (registers, stack, LDS) and resolve call graph + dependencies across translation units. + + Each entry uses a tagged, length-prefixed binary encoding: + + .. code-block:: none + + [kind: u8] [len: u8] [payload: <len> bytes] + + A function scope is opened by an ``INFO_FUNC`` entry whose payload is an + 8-byte relocated symbol reference. All subsequent entries until the next + ``INFO_FUNC`` or end of section belong to that scope. The format is + forward-compatible: unknown kinds can be skipped by reading the length byte. + + .. table:: AMDGPU Info Entry Kinds + :name: amdgpu-info-entry-kinds-table + + ===== ============================== ========================================== + Value Name Payload + ===== ============================== ========================================== + 1 ``INFO_FUNC`` 8B symbol ref; opens function scope + 2 ``INFO_FLAGS`` u32; ``FuncInfoFlags`` bitfield + 3 ``INFO_NUM_SGPR`` u32; SGPRs explicitly used + 4 ``INFO_NUM_VGPR`` u32; architectural VGPRs used + 5 ``INFO_NUM_AGPR`` u32; accumulator VGPRs (AGPRs) used + 6 ``INFO_PRIVATE_SEGMENT_SIZE`` u32; private (scratch) segment bytes + 7 ``INFO_USE`` 8B symbol ref; resource dependency edge + 8 ``INFO_CALL`` 8B symbol ref; direct call edge + 9 ``INFO_INDIRECT_CALL`` u32 strtab offset; indirect call type-ID + 10 ``INFO_TYPEID`` u32 strtab offset; function type-ID + ===== ============================== ========================================== + + .. table:: AMDGPU Info Function Flags (``INFO_FLAGS``) + :name: amdgpu-info-flags-table + + ===== =========================== ========================================== + Bit Name Description + ===== =========================== ========================================== + 0x1 ``FUNC_USES_VCC`` Function uses the VCC register + 0x2 ``FUNC_USES_FLAT_SCRATCH`` Function uses flat scratch addressing + 0x4 ``FUNC_HAS_DYN_STACK`` Function has dynamic stack allocation + ===== =========================== ========================================== + + Symbol references (``INFO_FUNC``, ``INFO_USE``, ``INFO_CALL``) generate + ``R_AMDGPU_ABS64`` relocations in ``.rela.amdgpu.info``. String payloads + (``INFO_INDIRECT_CALL``, ``INFO_TYPEID``) store a ``u32`` offset into + the companion ``.amdgpu.strtab`` section. + + See :ref:`amdgpu-assembler-directive-amdgpu-info` for the assembly syntax. + +``.amdgpu.strtab`` + Null-terminated string pool for the ``.amdgpu.info`` section. Contains + type-ID strings referenced by ``INFO_INDIRECT_CALL`` and ``INFO_TYPEID`` + entries. Only present when ``.amdgpu.info`` requires string data. + .. _amdgpu-note-records: Note Records @@ -21766,6 +21829,49 @@ semantics described in :ref:`amdgpu-amdhsa-code-object-metadata-v3`, This directive is terminated by an ``.end_amdgpu_metadata`` directive. +.. _amdgpu-assembler-directive-amdgpu-info: + +.amdgpu_info <symbol> ++++++++++++++++++++++ + +Begins a per-function metadata block for ``<symbol>`` in the ``.amdgpu.info`` +section (see :ref:`amdgpu-info-section`). Only valid when the OS is ``amdhsa``. +The block is terminated by an ``.end_amdgpu_info`` directive. + +The following sub-directives may appear inside the block: + + .. table:: .amdgpu_info Sub-Directives + :name: amdgpu-info-sub-directives-table + + ====================================== ========================================== + Directive Description + ====================================== ========================================== + ``.amdgpu_flags`` *value* ``FuncInfoFlags`` bitfield (u32) + ``.amdgpu_num_sgpr`` *value* SGPRs explicitly used (u32) + ``.amdgpu_num_vgpr`` *value* Architectural VGPRs used (u32) + ``.amdgpu_num_agpr`` *value* Accumulator VGPRs used (u32) + ``.amdgpu_private_segment_size`` *n* Private segment size in bytes (u32) + ``.amdgpu_use`` *symbol* Resource dependency (LDS or barrier) + ``.amdgpu_call`` *symbol* Direct call edge to *symbol* + ``.amdgpu_indirect_call`` *"type-id"* Indirect call with given type-ID string + ``.amdgpu_typeid`` *"type-id"* Type-ID for an address-taken function + ====================================== ========================================== + +Example: + +.. code-block:: nasm + + .amdgpu_info my_kernel + .amdgpu_flags 7 + .amdgpu_num_sgpr 33 + .amdgpu_num_vgpr 32 + .amdgpu_num_agpr 0 + .amdgpu_private_segment_size 0 + .amdgpu_use lds_var + .amdgpu_call helper + .amdgpu_indirect_call "vi" + .end_amdgpu_info + .. _amdgpu-amdhsa-assembler-example-v3-onwards: Code Object V3 and Above Example Source Code diff --git a/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h b/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h new file mode 100644 index 000000000000..e65161e6545f --- /dev/null +++ b/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h @@ -0,0 +1,74 @@ +//===--- AMDGPUObjLinkingInfo.h ---------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +/// \file +/// Enums shared between the AMDGPU backend (LLVM) and the ELF linker (LLD) +/// for the `.amdgpu.info` object-linking metadata section. +/// +/// Binary layout of each entry: [kind: u8] [len: u8] [payload: <len> bytes]. +/// Unknown kinds are forward-compatible: a consumer skips them by reading len. +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H +#define LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H + +#include "llvm/ADT/BitmaskEnum.h" + +#include <cstdint> + +namespace llvm { +namespace AMDGPU { + +/// Entry kind values for the `.amdgpu.info` section. +/// +/// Entries that appear between an INFO_FUNC and the next INFO_FUNC (or end of +/// section) belong to the function scope opened by that INFO_FUNC. +enum class InfoKind : uint8_t { + /// Opens a new function scope. Payload is an 8-byte symbol reference + /// (relocated) identifying the function. All subsequent entries until the + /// next INFO_FUNC belong to this function. + INFO_FUNC = 1, + /// Bitfield of FuncInfoFlags properties for the function. [u32] + INFO_FLAGS = 2, + /// Number of SGPRs explicitly used by the function. [u32] + INFO_NUM_SGPR = 3, + /// Number of architectural VGPRs used by the function. [u32] + INFO_NUM_VGPR = 4, + /// Number of accumulator VGPRs (AGPRs) used by the function. [u32] + INFO_NUM_AGPR = 5, + /// Private (scratch) memory size in bytes required by the function. [u32] + INFO_PRIVATE_SEGMENT_SIZE = 6, + /// Dependency edge: the function uses the resource identified by the + /// 8-byte relocated symbol (e.g. an LDS variable or named barrier). + INFO_USE = 7, + /// Direct call edge: the function calls the callee identified by the + /// 8-byte relocated symbol. + INFO_CALL = 8, + /// Indirect call edge: the function contains an indirect call whose + /// callee is expected to match the type-ID string at the given + /// `.amdgpu.strtab` offset. [u32] + INFO_INDIRECT_CALL = 9, + /// Function type ID: tags an address-taken function with a type-ID + /// string (at the given `.amdgpu.strtab` offset) so the linker can match + /// it against INFO_INDIRECT_CALL entries. [u32] + INFO_TYPEID = 10, +}; + +/// Per-function flags packed into INFO_FLAGS entries. +enum class FuncInfoFlags : uint32_t { + FUNC_USES_VCC = 1U << 0, + FUNC_USES_FLAT_SCRATCH = 1U << 1, + FUNC_HAS_DYN_STACK = 1U << 2, + LLVM_MARK_AS_BITMASK_ENUM(/*LargestValue=*/FUNC_HAS_DYN_STACK), +}; + +} // namespace AMDGPU +} // namespace llvm + +#endif // LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index 718b2b154e25..2e5e9ef0a3f5 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -32,6 +32,7 @@ #include "Utils/AMDGPUBaseInfo.h" #include "Utils/AMDKernelCodeTUtils.h" #include "Utils/SIDefinesUtils.h" +#include "llvm/ADT/StringSet.h" #include "llvm/Analysis/OptimizationRemarkEmitter.h" #include "llvm/BinaryFormat/ELF.h" #include "llvm/CodeGen/AsmPrinterHandler.h" @@ -537,6 +538,133 @@ void AMDGPUAsmPrinter::validateMCResourceInfo(Function &F) { } } +static void appendTypeEncoding(std::string &Enc, Type *Ty, + const DataLayout &DL) { + if (Ty->isVoidTy()) { + Enc += 'v'; + return; + } + unsigned Bits = DL.getTypeSizeInBits(Ty); + // Zero-sized non-void types (e.g. `{}` or `[0 x i8]`) would collapse to the + // same encoding as i32, which would silently conflate distinct signatures. + // These aren't expected in indirect-callable signatures in practice. + assert(Bits > 0 && "Unexpected zero-sized non-void type in type-ID " + "encoding; encoding would be ambiguous"); + if (Bits <= 32) + Enc += 'i'; + else if (Bits <= 64) + Enc += 'l'; + else + Enc.append(divideCeil(Bits, 32), 'i'); +} + +static std::string computeTypeId(const FunctionType *FTy, + const DataLayout &DL) { + std::string Enc; + appendTypeEncoding(Enc, FTy->getReturnType(), DL); + for (Type *ParamTy : FTy->params()) + appendTypeEncoding(Enc, ParamTy, DL); + return Enc; +} + +void AMDGPUAsmPrinter::collectCallEdge(const MachineInstr &MI) { + if (!AMDGPUTargetMachine::EnableObjectLinking) + return; + const SIInstrInfo *TII = MF->getSubtarget<GCNSubtarget>().getInstrInfo(); + const MachineOperand *Callee = + TII->getNamedOperand(MI, AMDGPU::OpName::callee); + if (!Callee || !Callee->isGlobal()) + return; + DirectCallEdges.insert( + {getSymbol(&MF->getFunction()), getSymbol(Callee->getGlobal())}); +} + +void AMDGPUAsmPrinter::emitAMDGPUInfo(Module &M) { + if (!AMDGPUTargetMachine::EnableObjectLinking) + return; + + const NamedMDNode *LDSMD = M.getNamedMetadata("amdgpu.lds.uses"); + bool HasLDSUses = LDSMD && LDSMD->getNumOperands() > 0; + + const NamedMDNode *BarMD = M.getNamedMetadata("amdgpu.named_barrier.uses"); + bool HasNamedBarriers = BarMD && BarMD->getNumOperands() > 0; + + // Collect address-taken functions (with type IDs) and indirect call sites. + DenseMap<const Function *, std::string> AddrTakenTypeIds; + using IndirectCallInfo = std::pair<const Function *, std::string>; + SmallVector<IndirectCallInfo, 8> IndirectCalls; + + for (const Function &F : M) { + bool IsKernel = AMDGPU::isKernel(F.getCallingConv()); + + if (!IsKernel && F.hasAddressTaken(/*PutOffender=*/nullptr, + /*IgnoreCallbackUses=*/false, + /*IgnoreAssumeLikeCalls=*/true, + /*IgnoreLLVMUsed=*/true)) { + AddrTakenTypeIds[&F] = + computeTypeId(F.getFunctionType(), M.getDataLayout()); + } + + if (F.isDeclaration()) + continue; + + StringSet<> SeenTypeIds; + for (const BasicBlock &BB : F) { + for (const Instruction &I : BB) { + const auto *CB = dyn_cast<CallBase>(&I); + if (!CB || !CB->isIndirectCall()) + continue; + std::string TId = + computeTypeId(CB->getFunctionType(), M.getDataLayout()); + if (SeenTypeIds.insert(TId).second) + IndirectCalls.push_back({&F, std::move(TId)}); + } + } + } + + if (FunctionInfos.empty() && DirectCallEdges.empty() && !HasLDSUses && + !HasNamedBarriers && AddrTakenTypeIds.empty() && IndirectCalls.empty()) + return; + + AMDGPU::InfoSectionData Data; + Data.Funcs = std::move(FunctionInfos); + + for (auto &[F, TypeId] : AddrTakenTypeIds) { + MCSymbol *Sym = getSymbol(F); + Data.TypeIds.push_back({Sym, TypeId}); + } + + for (auto &[CallerSym, CalleeSym] : DirectCallEdges) + Data.Calls.push_back({CallerSym, CalleeSym}); + DirectCallEdges.clear(); + + if (HasLDSUses) { + for (const MDNode *N : LDSMD->operands()) { + auto *Func = mdconst::extract<Function>(N->getOperand(0)); + auto *LdsVar = mdconst::extract<GlobalVariable>(N->getOperand(1)); + Data.Uses.push_back({getSymbol(Func), getSymbol(LdsVar)}); + } + } + + if (HasNamedBarriers) { + for (const MDNode *N : BarMD->operands()) { + auto *BarVar = mdconst::extract<GlobalVariable>(N->getOperand(0)); + MCSymbol *BarSym = getSymbol(BarVar); + for (unsigned I = 1, E = N->getNumOperands(); I < E; ++I) { + auto *Func = mdconst::extract<Function>(N->getOperand(I)); + Data.Uses.push_back({getSymbol(Func), BarSym}); + } + } + } + + for (auto &[Caller, Enc] : IndirectCalls) { + MCSymbol *CallerSym = getSymbol(Caller); + Data.IndirectCalls.push_back({CallerSym, Enc}); + } + + getTargetStreamer()->emitAMDGPUInfo(Data); +} + bool AMDGPUAsmPrinter::doFinalization(Module &M) { // Pad with s_code_end to help tools and guard against instruction prefetch // causing stale data in caches. Arguably this should be done by the linker, @@ -553,6 +681,10 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) { } } + // Emit the unified .amdgpu.info section (per-function resources, call graph, + // LDS/named-barrier use edges, indirect calls, and address-taken type IDs). + emitAMDGPUInfo(M); + // Assign expressions which can only be resolved when all other functions are // known. RI.finalize(OutContext); @@ -567,8 +699,15 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) { RI.getMaxSGPRSymbol(OutContext), RI.getMaxNamedBarrierSymbol(OutContext)); OutStreamer->popSection(); - for (Function &F : M.functions()) - validateMCResourceInfo(F); + // In the object-linking pipeline per-function resource MCExprs reference + // external callee symbols that cannot be evaluated here, so cross-TU limit + // checks would silently no-op for every non-leaf function. Defer resource + // sanity checking to the linker, which re-validates against the aggregated + // call graph in the combined .amdgpu.info metadata. + if (!AMDGPUTargetMachine::EnableObjectLinking) { + for (Function &F : M.functions()) + validateMCResourceInfo(F); + } RI.reset(); @@ -729,6 +868,20 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) { RI.gatherResourceInfo(MF, *ResourceUsage, OutContext); + if (AMDGPUTargetMachine::EnableObjectLinking) { + const AMDGPUResourceUsageAnalysisWrapperPass::FunctionResourceInfo &RU = + *ResourceUsage; + FunctionInfos.push_back( + {/*NumSGPR=*/static_cast<uint32_t>(RU.NumExplicitSGPR), + /*NumArchVGPR=*/static_cast<uint32_t>(RU.NumVGPR), + /*NumAccVGPR=*/static_cast<uint32_t>(RU.NumAGPR), + /*PrivateSegmentSize=*/static_cast<uint32_t>(RU.PrivateSegmentSize), + /*UsesVCC=*/RU.UsesVCC, + /*UsesFlatScratch=*/RU.UsesFlatScratch, + /*HasDynStack=*/RU.HasDynamicallySizedStack, + /*Sym=*/getSymbol(&MF.getFunction())}); + } + if (MFI->isModuleEntryFunction()) { getSIProgramInfo(CurrentProgramInfo, MF); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h index 31d10fe92ca2..9066b2d419f8 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h @@ -15,7 +15,10 @@ #define LLVM_LIB_TARGET_AMDGPU_AMDGPUASMPRINTER_H #include "AMDGPUMCResourceInfo.h" +#include "AMDGPUResourceUsageAnalysis.h" +#include "MCTargetDesc/AMDGPUTargetStreamer.h" #include "SIProgramInfo.h" +#include "llvm/ADT/SetVector.h" #include "llvm/CodeGen/AsmPrinter.h" namespace llvm { @@ -86,6 +89,13 @@ private: void initTargetStreamer(Module &M); + void emitAMDGPUInfo(Module &M); + void collectCallEdge(const MachineInstr &MI); + + SetVector<std::pair<MCSymbol *, MCSymbol *>> DirectCallEdges; + + SmallVector<AMDGPU::FuncInfo, 8> FunctionInfos; + SmallString<128> getMCExprStr(const MCExpr *Value); /// Attempts to replace the validation that is missed in getSIProgramInfo due diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp index 56592bde3b1c..3c89e3d287b3 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp @@ -320,6 +320,9 @@ static void emitVGPRBlockComment(const MachineInstr *MI, const SIInstrInfo *TII, } void AMDGPUAsmPrinter::emitInstruction(const MachineInstr *MI) { + if (MI->isCall()) + collectCallEdge(*MI); + // FIXME: Enable feature predicate checks once all the test pass. // AMDGPU_MC::verifyInstructionPredicates(MI->getOpcode(), // getSubtargetInfo().getFeatureBits()); diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp index 2e300733b5c9..5d11e8a66c7c 100644 --- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp +++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp @@ -38,6 +38,7 @@ #include "llvm/MC/MCSymbol.h" #include "llvm/MC/TargetRegistry.h" #include "llvm/Support/AMDGPUMetadata.h" +#include "llvm/Support/AMDGPUObjLinkingInfo.h" #include "llvm/Support/AMDHSAKernelDescriptor.h" #include "llvm/Support/Casting.h" #include "llvm/Support/Compiler.h" @@ -1382,6 +1383,9 @@ class AMDGPUAsmParser : public MCTargetAsmParser { return getRegBitWidth(RCID) / 8; } + AMDGPU::InfoSectionData InfoData; + bool HasInfoData = false; + private: void createConstantSymbol(StringRef Id, int64_t Val); @@ -1422,6 +1426,7 @@ private: bool ParseDirectivePALMetadataBegin(); bool ParseDirectivePALMetadata(); bool ParseDirectiveAMDGPULDS(); + bool ParseDirectiveAMDGPUInfo(); /// Common code to parse out a block of text (typically YAML) between start and /// end directives. @@ -1676,6 +1681,7 @@ public: uint64_t &ErrorInfo, bool MatchingInlineAsm) override; bool ParseDirective(AsmToken DirectiveID) override; + void onEndOfFile() override; ParseStatus parseOperand(OperandVector &Operands, StringRef Mnemonic, OperandMode Mode = OperandMode_Default); StringRef parseMnemonicSuffix(StringRef Name); @@ -6741,6 +6747,110 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPULDS() { return false; } +bool AMDGPUAsmParser::ParseDirectiveAMDGPUInfo() { + if (getParser().checkForValidSection()) + return true; + + StringRef FuncName; + if (getParser().parseIdentifier(FuncName)) + return TokError("expected symbol name after .amdgpu_info"); + + MCSymbol *FuncSym = getContext().getOrCreateSymbol(FuncName); + AMDGPU::FuncInfo FI; + FI.Sym = FuncSym; + bool HasScalarAttrs = false; + + while (true) { + while (trySkipToken(AsmToken::EndOfStatement)) + ; + + StringRef ID; + SMLoc IDLoc = getLoc(); + if (!parseId(ID, "expected directive or .end_amdgpu_info")) + return true; + + if (ID == ".end_amdgpu_info") + break; + + // Every per-entry directive shares the `.amdgpu_` namespace prefix; strip + // it once and dispatch on the distinguishing suffix below. The unstripped + // ID is preserved for diagnostics. + StringRef Dir = ID; + if (!Dir.consume_front(".amdgpu_")) + return Error(IDLoc, "unknown .amdgpu_info directive '" + ID + "'"); + + if (Dir == "flags") { + int64_t Val; + if (getParser().parseAbsoluteExpression(Val)) + return true; + auto Flags = static_cast<AMDGPU::FuncInfoFlags>(Val); + FI.UsesVCC = !!(Flags & AMDGPU::FuncInfoFlags::FUNC_USES_VCC); + FI.UsesFlatScratch = + !!(Flags & AMDGPU::FuncInfoFlags::FUNC_USES_FLAT_SCRATCH); + FI.HasDynStack = !!(Flags & AMDGPU::FuncInfoFlags::FUNC_HAS_DYN_STACK); + HasScalarAttrs = true; + } else if (Dir == "num_sgpr") { + int64_t Val; + if (getParser().parseAbsoluteExpression(Val)) + return true; + FI.NumSGPR = static_cast<uint32_t>(Val); + HasScalarAttrs = true; + } else if (Dir == "num_vgpr") { + int64_t Val; + if (getParser().parseAbsoluteExpression(Val)) + return true; + FI.NumArchVGPR = static_cast<uint32_t>(Val); + HasScalarAttrs = true; + } else if (Dir == "num_agpr") { + int64_t Val; + if (getParser().parseAbsoluteExpression(Val)) + return true; + FI.NumAccVGPR = static_cast<uint32_t>(Val); + HasScalarAttrs = true; + } else if (Dir == "private_segment_size") { + int64_t Val; + if (getParser().parseAbsoluteExpression(Val)) + return true; + FI.PrivateSegmentSize = static_cast<uint32_t>(Val); + HasScalarAttrs = true; + } else if (Dir == "use") { + StringRef ResName; + if (getParser().parseIdentifier(ResName)) + return TokError("expected resource symbol for .amdgpu_use"); + InfoData.Uses.push_back( + {FuncSym, getContext().getOrCreateSymbol(ResName)}); + } else if (Dir == "call") { + StringRef DstName; + if (getParser().parseIdentifier(DstName)) + return TokError("expected callee symbol for .amdgpu_call"); + InfoData.Calls.push_back( + {FuncSym, getContext().getOrCreateSymbol(DstName)}); + } else if (Dir == "indirect_call") { + std::string TypeId; + if (getParser().parseEscapedString(TypeId)) + return TokError("expected type ID string for .amdgpu_indirect_call"); + InfoData.IndirectCalls.push_back({FuncSym, std::move(TypeId)}); + } else if (Dir == "typeid") { + std::string TypeId; + if (getParser().parseEscapedString(TypeId)) + return TokError("expected type ID string for .amdgpu_typeid"); + InfoData.TypeIds.push_back({FuncSym, std::move(TypeId)}); + } else { + return Error(IDLoc, "unknown .amdgpu_info directive '" + ID + "'"); + } + } + + if (HasScalarAttrs) + InfoData.Funcs.push_back(std::move(FI)); + HasInfoData = true; + return false; +} + +void AMDGPUAsmParser::onEndOfFile() { + if (HasInfoData) + getTargetStreamer().emitAMDGPUInfo(InfoData); +} + bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) { StringRef IDVal = DirectiveID.getString(); @@ -6778,6 +6888,9 @@ bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) { if (IDVal == ".amdgpu_lds") return ParseDirectiveAMDGPULDS(); + if (IDVal == ".amdgpu_info") + return ParseDirectiveAMDGPUInfo(); + if (IDVal == PALMD::AssemblerDirectiveBegin) return ParseDirectivePALMetadataBegin(); diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp index d276bab0ff3b..aefe11608d15 100644 --- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp +++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp @@ -25,7 +25,9 @@ #include "llvm/MC/MCELFObjectWriter.h" #include "llvm/MC/MCELFStreamer.h" #include "llvm/MC/MCSubtargetInfo.h" +#include "llvm/MC/StringTableBuilder.h" #include "llvm/Support/AMDGPUMetadata.h" +#include "llvm/Support/AMDGPUObjLinkingInfo.h" #include "llvm/Support/AMDHSAKernelDescriptor.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/FormattedStream.h" @@ -664,6 +666,103 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor( OS << "\t.end_amdhsa_kernel\n"; } +namespace { +/// Callback type invoked by \c forEachInfoScope for each function scope in +/// the canonical iteration order. The scope is emitted exactly once per +/// unique \p Sym regardless of how many flat entries reference it. +using InfoScopeEmitter = function_ref<void( + MCSymbol *Sym, const AMDGPU::FuncInfo *Info, ArrayRef<MCSymbol *> Uses, + ArrayRef<MCSymbol *> Calls, ArrayRef<StringRef> IndirectCallTypeIds, + ArrayRef<StringRef> TypeIds)>; + +/// Group the flat edge lists in \p Data by source function symbol and drive +/// per-scope emission. A scope is opened for every function with attached +/// info and for every function that appears only as an edge source; each +/// scope is emitted exactly once. Both the asm and ELF streamers share this +/// iteration logic and only differ in the per-scope emission callback. +static void forEachInfoScope(const AMDGPU::InfoSectionData &Data, + InfoScopeEmitter Emit) { + DenseMap<MCSymbol *, SmallVector<MCSymbol *, 2>> FuncUses; + DenseMap<MCSymbol *, SmallVector<MCSymbol *, 4>> FuncCalls; + DenseMap<MCSymbol *, SmallVector<StringRef, 2>> FuncIndirectCalls; + DenseMap<MCSymbol *, SmallVector<StringRef, 1>> FuncTypeIds; + for (const auto &[Func, Res] : Data.Uses) + FuncUses[Func].push_back(Res); + for (const auto &[Src, Dst] : Data.Calls) + FuncCalls[Src].push_back(Dst); + for (const auto &[Func, TypeId] : Data.IndirectCalls) + FuncIndirectCalls[Func].push_back(TypeId); + for (const auto &[Sym, TypeId] : Data.TypeIds) + FuncTypeIds[Sym].push_back(TypeId); + + DenseSet<MCSymbol *> Emitted; + auto EmitIfNew = [&](MCSymbol *Sym, const AMDGPU::FuncInfo *Info) { + if (!Emitted.insert(Sym).second) + return; + ArrayRef<MCSymbol *> Uses, Calls; + ArrayRef<StringRef> IndirectCallTypeIds, TypeIds; + if (auto It = FuncUses.find(Sym); It != FuncUses.end()) + Uses = It->second; + if (auto It = FuncCalls.find(Sym); It != FuncCalls.end()) + Calls = It->second; + if (auto It = FuncIndirectCalls.find(Sym); It != FuncIndirectCalls.end()) + IndirectCallTypeIds = It->second; + if (auto It = FuncTypeIds.find(Sym); It != FuncTypeIds.end()) + TypeIds = It->second; + Emit(Sym, Info, Uses, Calls, IndirectCallTypeIds, TypeIds); + }; + + for (const AMDGPU::FuncInfo &Func : Data.Funcs) + EmitIfNew(Func.Sym, &Func); + // Emit scopes for functions that only appear as edge sources (e.g. typeid + // tags on address-taken declarations, or callers of external functions). + for (const auto &[Sym, TypeId] : Data.TypeIds) + EmitIfNew(Sym, nullptr); + for (const auto &[Sym, Res] : Data.Uses) + EmitIfNew(Sym, nullptr); + for (const auto &[Sym, Dst] : Data.Calls) + EmitIfNew(Sym, nullptr); + for (const auto &[Sym, TypeId] : Data.IndirectCalls) + EmitIfNew(Sym, nullptr); +} +} // namespace + +void AMDGPUTargetAsmStreamer::emitAMDGPUInfo( + const AMDGPU::InfoSectionData &Data) { + forEachInfoScope(Data, [&](MCSymbol *Sym, const AMDGPU::FuncInfo *Info, + ArrayRef<MCSymbol *> Uses, + ArrayRef<MCSymbol *> Calls, + ArrayRef<StringRef> IndirectCallTypeIds, + ArrayRef<StringRef> TypeIds) { + OS << "\t.amdgpu_info " << Sym->getName() << '\n'; + if (Info) { + AMDGPU::FuncInfoFlags Flags{}; + if (Info->UsesVCC) + Flags |= AMDGPU::FuncInfoFlags::FUNC_USES_VCC; + if (Info->UsesFlatScratch) + Flags |= AMDGPU::FuncInfoFlags::FUNC_USES_FLAT_SCRATCH; + if (Info->HasDynStack) + Flags |= AMDGPU::FuncInfoFlags::FUNC_HAS_DYN_STACK; + OS << "\t\t.amdgpu_flags " << llvm::to_underlying(Flags) << '\n'; + OS << "\t\t.amdgpu_num_sgpr " << Info->NumSGPR << '\n'; + OS << "\t\t.amdgpu_num_vgpr " << Info->NumArchVGPR << '\n'; + if (Info->NumAccVGPR) + OS << "\t\t.amdgpu_num_agpr " << Info->NumAccVGPR << '\n'; + OS << "\t\t.amdgpu_private_segment_size " << Info->PrivateSegmentSize + << '\n'; + } + for (MCSymbol *Res : Uses) + OS << "\t\t.amdgpu_use " << Res->getName() << '\n'; + for (MCSymbol *Dst : Calls) + OS << "\t\t.amdgpu_call " << Dst->getName() << '\n'; + for (StringRef TypeId : IndirectCallTypeIds) + OS << "\t\t.amdgpu_indirect_call \"" << TypeId << "\"\n"; + for (StringRef TypeId : TypeIds) + OS << "\t\t.amdgpu_typeid \"" << TypeId << "\"\n"; + OS << "\t.end_amdgpu_info\n\n"; + }); +} + //===----------------------------------------------------------------------===// // AMDGPUTargetELFStreamer //===----------------------------------------------------------------------===// @@ -1065,3 +1164,83 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor( for (uint32_t i = 0; i < sizeof(amdhsa::kernel_descriptor_t::reserved3); ++i) Streamer.emitInt8(0u); } + +void AMDGPUTargetELFStreamer::emitAMDGPUInfo( + const AMDGPU::InfoSectionData &Data) { + MCELFStreamer &S = getStreamer(); + MCContext &Context = S.getContext(); + + StringTableBuilder StrTab(StringTableBuilder::ELF); + auto getOrAddString = [&](StringRef Str) -> uint32_t { + if (Str.empty()) + return UINT32_MAX; + return StrTab.add(Str); + }; + + auto EmitU32Entry = [&](AMDGPU::InfoKind Kind, uint32_t Val) { + S.emitInt8(static_cast<uint8_t>(Kind)); + S.emitInt8(4); + S.emitInt32(Val); + }; + auto EmitSymEntry = [&](AMDGPU::InfoKind Kind, MCSymbol *Sym) { + S.emitInt8(static_cast<uint8_t>(Kind)); + S.emitInt8(8); + S.emitValue(MCSymbolRefExpr::create(Sym, Context), 8); + }; + + S.pushSection(); + MCSectionELF *InfoSec = Context.getELFSection( + ".amdgpu.info", ELF::SHT_PROGBITS, ELF::SHF_EXCLUDE); + S.switchSection(InfoSec); + + forEachInfoScope(Data, [&](MCSymbol *Sym, const AMDGPU::FuncInfo *Info, + ArrayRef<MCSymbol *> Uses, + ArrayRef<MCSymbol *> Calls, + ArrayRef<StringRef> IndirectCallTypeIds, + ArrayRef<StringRef> TypeIds) { + EmitSymEntry(AMDGPU::InfoKind::INFO_FUNC, Sym); + + if (Info) { + AMDGPU::FuncInfoFlags Flags{}; + if (Info->UsesVCC) + Flags |= AMDGPU::FuncInfoFlags::FUNC_USES_VCC; + if (Info->UsesFlatScratch) + Flags |= AMDGPU::FuncInfoFlags::FUNC_USES_FLAT_SCRATCH; + if (Info->HasDynStack) + Flags |= AMDGPU::FuncInfoFlags::FUNC_HAS_DYN_STACK; + EmitU32Entry(AMDGPU::InfoKind::INFO_FLAGS, llvm::to_underlying(Flags)); + EmitU32Entry(AMDGPU::InfoKind::INFO_NUM_SGPR, Info->NumSGPR); + EmitU32Entry(AMDGPU::InfoKind::INFO_NUM_VGPR, Info->NumArchVGPR); + // INFO_NUM_AGPR is only emitted when the function actually uses AGPRs, + // since AGPRs are not available on all architectures. + if (Info->NumAccVGPR) + EmitU32Entry(AMDGPU::InfoKind::INFO_NUM_AGPR, Info->NumAccVGPR); + EmitU32Entry(AMDGPU::InfoKind::INFO_PRIVATE_SEGMENT_SIZE, + Info->PrivateSegmentSize); + } + + for (MCSymbol *Res : Uses) + EmitSymEntry(AMDGPU::InfoKind::INFO_USE, Res); + for (MCSymbol *Dst : Calls) + EmitSymEntry(AMDGPU::InfoKind::INFO_CALL, Dst); + for (StringRef TypeId : IndirectCallTypeIds) { + EmitU32Entry(AMDGPU::InfoKind::INFO_INDIRECT_CALL, + getOrAddString(TypeId)); + } + for (StringRef TypeId : TypeIds) + EmitU32Entry(AMDGPU::InfoKind::INFO_TYPEID, getOrAddString(TypeId)); + }); + + if (!StrTab.empty()) { + StrTab.finalizeInOrder(); + MCSectionELF *Sec = Context.getELFSection(".amdgpu.strtab", ELF::SHT_STRTAB, + ELF::SHF_EXCLUDE); + S.switchSection(Sec); + SmallString<128> Buf; + raw_svector_ostream OS(Buf); + StrTab.write(OS); + S.emitBytes(Buf); + } + + S.popSection(); +} diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h index 3a0d8dcd2d27..ca1fe3ccf3da 100644 --- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h +++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h @@ -11,7 +11,10 @@ #include "Utils/AMDGPUBaseInfo.h" #include "Utils/AMDGPUPALMetadata.h" +#include "llvm/ADT/SmallVector.h" #include "llvm/MC/MCStreamer.h" +#include <string> +#include <utility> namespace llvm { @@ -26,6 +29,27 @@ struct MCKernelDescriptor; namespace HSAMD { struct Metadata; } + +struct FuncInfo { + uint32_t NumSGPR = 0; + uint32_t NumArchVGPR = 0; + uint32_t NumAccVGPR = 0; + uint32_t PrivateSegmentSize = 0; + bool UsesVCC = false; + bool UsesFlatScratch = false; + bool HasDynStack = false; + + MCSymbol *Sym = nullptr; +}; + +struct InfoSectionData { + SmallVector<FuncInfo, 8> Funcs; + SmallVector<std::pair<MCSymbol *, MCSymbol *>, 4> Uses; + SmallVector<std::pair<MCSymbol *, MCSymbol *>, 8> Calls; + SmallVector<std::pair<MCSymbol *, std::string>, 4> IndirectCalls; + SmallVector<std::pair<MCSymbol *, std::string>, 4> TypeIds; +}; + } // namespace AMDGPU class AMDGPUTargetStreamer : public MCTargetStreamer { @@ -104,6 +128,8 @@ public: const MCExpr *ReserveVCC, const MCExpr *ReserveFlatScr) {} + virtual void emitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) {} + static StringRef getArchNameFromElfMach(unsigned ElfMach); static unsigned getElfMach(StringRef GPU); @@ -168,6 +194,8 @@ public: const MCExpr *NextVGPR, const MCExpr *NextSGPR, const MCExpr *ReserveVCC, const MCExpr *ReserveFlatScr) override; + + void emitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) override; }; class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer { @@ -221,6 +249,8 @@ public: const MCExpr *NextVGPR, const MCExpr *NextSGPR, const MCExpr *ReserveVCC, const MCExpr *ReserveFlatScr) override; + + void emitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) override; }; } #endif diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll new file mode 100644 index 000000000000..6442d6f6501c --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll @@ -0,0 +1,23 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-enable-object-linking < %s | FileCheck %s + +; Verify that .amdgpu_num_agpr IS emitted when AGPRs are used on a target +; that supports them (gfx908 has a separate AGPR file). + +declare <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float, float, <4 x float>, i32, i32, i32) + +define void @func_with_agpr(float %a, float %b, ptr addrspace(1) %out) { + %result = call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float %a, float %b, <4 x float> zeroinitializer, i32 0, i32 0, i32 0) + store <4 x float> %result, ptr addrspace(1) %out + ret void +} + +define amdgpu_kernel void @kern(float %a, float %b, ptr addrspace(1) %out) { + call void @func_with_agpr(float %a, float %b, ptr addrspace(1) %out) + ret void +} + +; CHECK: .amdgpu_info func_with_agpr +; CHECK: .amdgpu_num_agpr {{[1-9][0-9]*}} +; CHECK: .end_amdgpu_info +; CHECK: .amdgpu_info kern +; CHECK: .end_amdgpu_info diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll new file mode 100644 index 000000000000..0297a2a6e049 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll @@ -0,0 +1,62 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --sections - | FileCheck %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=asm < %s | FileCheck %s --check-prefix=ASM --implicit-check-not=.amdgpu_num_agpr + +; Test that the unified .amdgpu.info section (.amdgpu_info blocks in assembly) is +; emitted with correct relocations when object linking is enabled. + +declare void @extern_func() +declare void @tail_extern() + +; The .amdgpu.info section should exist as SHT_PROGBITS with SHF_EXCLUDE. +; CHECK: Section { +; CHECK: Name: .amdgpu.info +; CHECK: Type: SHT_PROGBITS +; CHECK: Flags [ +; CHECK: SHF_EXCLUDE +; CHECK: ] + +; Symbol references in the binary resource metadata still use R_AMDGPU_ABS64 relocations. +; CHECK-DAG: R_AMDGPU_ABS64 my_kernel +; CHECK-DAG: R_AMDGPU_ABS64 helper +; CHECK-DAG: R_AMDGPU_ABS64 extern_func +; COM: Tail-call callee must still be recorded as an INFO_CALL edge. +; CHECK-DAG: R_AMDGPU_ABS64 tail_helper +; CHECK-DAG: R_AMDGPU_ABS64 tail_extern + +; COM: Assembly: per-function .amdgpu_info blocks (target flags derived from +; COM: e_flags). +; ASM-DAG: .amdgpu_info helper +; ASM-DAG: .amdgpu_flags {{[0-9]+}} +; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}} +; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}} +; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}} +; ASM-DAG: .amdgpu_call extern_func +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info my_kernel +; ASM-DAG: .amdgpu_flags {{[0-9]+}} +; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}} +; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}} +; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}} +; ASM-DAG: .amdgpu_call helper +; ASM-DAG: .end_amdgpu_info + +; COM: A tail call is lowered to SI_TCRETURN (isCall = 1). Verify that the +; COM: callee edge is still captured in the .amdgpu_info block of the caller. +; ASM-DAG: .amdgpu_info tail_helper +; ASM-DAG: .amdgpu_call tail_extern +; ASM-DAG: .end_amdgpu_info + +define void @helper() { + call void @extern_func() + ret void +} + +define amdgpu_kernel void @my_kernel() { + call void @helper() + ret void +} + +define void @tail_helper() { + tail call void @tail_extern() + ret void +} diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll index 46d4c8db00f0..02740e3bb0a1 100644 --- a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll +++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll @@ -1,9 +1,11 @@ -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking < %s | FileCheck %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking < %s | FileCheck %s --implicit-check-not=.amdgpu_num_agpr +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --sections - | FileCheck %s --check-prefix=ELF ; Verify object linking codegen for named barriers on GFX1250: ; 1. Barrier instructions use M0-based forms with relocation references -; 2. group_segment_fixed_size = 0 (linker patches it) -; 3. Named barrier is emitted as an SHN_AMDGPU_LDS symbol (.amdgpu_lds) +; 2. .amdgpu.info section records the barrier as an LDS use edge +; 3. group_segment_fixed_size = 0 (linker patches it) +; 4. Named barrier is emitted as an SHN_AMDGPU_LDS symbol (.amdgpu_lds) @bar = internal addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] poison @@ -13,12 +15,29 @@ ; CHECK: s_barrier_join m0 ; CHECK: s_barrier_wait 1 -; KD: group_segment_fixed_size = 0 (linker will patch). ; CHECK: .amdhsa_group_segment_fixed_size 0 -; LDS symbol declaration +; CHECK: .amdgpu_info kernel +; CHECK: .amdgpu_flags {{[0-9]+}} +; CHECK: .amdgpu_num_sgpr {{[0-9]+}} +; CHECK: .amdgpu_num_vgpr {{[0-9]+}} +; CHECK: .amdgpu_private_segment_size {{[0-9]+}} +; CHECK: .amdgpu_use __amdgpu_named_barrier.bar{{[^ ,]*}} +; CHECK: .amdgpu_call helper +; CHECK: .end_amdgpu_info + ; CHECK: .amdgpu_lds __amdgpu_named_barrier.bar{{[^ ,]*}}, 32, 4 +; ELF: Section { +; ELF: Name: .amdgpu.info +; ELF: Type: SHT_PROGBITS +; ELF: Flags [ +; ELF: SHF_EXCLUDE + +; ELF-DAG: R_AMDGPU_ABS64 kernel +; ELF-DAG: R_AMDGPU_ABS64 __amdgpu_named_barrier.bar{{[^ ]*}} +; ELF-DAG: R_AMDGPU_ABS64 helper + define amdgpu_kernel void @kernel() { call void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) @bar, i32 3) call void @llvm.amdgcn.s.barrier.join(ptr addrspace(3) @bar) diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll new file mode 100644 index 000000000000..afaf7cd19940 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll @@ -0,0 +1,221 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r - | FileCheck %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=asm < %s | FileCheck %s --check-prefix=ASM --implicit-check-not=.amdgpu_num_agpr + +; Test ABI register-size type ID generation for various function types. +; The type ID encodes each parameter/return by bit width: v=void, i=<=32-bit, +; l=33-64-bit, and >64-bit types widen to ceil(bits/32) x "i". Types with the +; same register footprint share an encoding (e.g. float(float) and i32(i32) +; both produce "ii"). Coverage here spans scalars, vectors (whose size is the +; total element bit width), pointers across address spaces (AS 1 is 64-bit on +; amdhsa; AS 3 / AS 5 are 32-bit), and small integer types (i1/i8/i16) that +; are not natively passed as function arguments but ABI-promoted to i32 slots +; -- they still encode as "i", matching an i32 parameter. +; +; Cross-TU coverage: an address-taken declaration (defined in another TU) still +; gets an .amdgpu_info scope with .amdgpu_typeid, but no per-function resource +; counts since its body isn't available here. +declare void @extern_decl(i32) + +define void @void_void() { + ret void +} + +define i32 @i32_i32(i32 %x) { + ret i32 %x +} + +define void @void_ptr_i32(ptr %p, i32 %x) { + ret void +} + +define i64 @i64_i64_i64(i64 %a, i64 %b) { + ret i64 %a +} + +define float @float_float(float %x) { + ret float %x +} + +; Address-space pointer widths: AS 1 (global) is 64-bit -> "l"; AS 3 (LDS) and +; AS 5 (private) are 32-bit -> "i". +define void @ptr_addrspaces(ptr addrspace(1) %g, ptr addrspace(3) %l, ptr addrspace(5) %p) { + ret void +} + +; Vector types: encoded by total bit width. <2 x i32> = 64 bits -> "l"; +; <4 x i32>/<4 x float>/<2 x i64> = 128 bits -> "iiii" each. +define <4 x i32> @vectors(<2 x i32> %a, <4 x float> %b, <2 x i64> %c) { + ret <4 x i32> zeroinitializer +} + +; Small integer types (i1/i8/i16) are ABI-promoted to i32 register slots on +; AMDGPU. They all collapse to "i" under the bit-width scheme, matching an i32 +; parameter, so callers declared as void(i32, ...) remain compatible with +; callees taking void(i8, ...). signext/zeroext attributes describe the +; promotion mode and do not affect the encoding. +define void @promoted_small_ints(i8 signext %a, i16 zeroext %b, i1 %c) { + ret void +} + +; Wider non-vector scalars: double is "l" (64 bits); i128 widens to 4 x "i" +; (ceil(128/32)). +define double @wide_scalars(double %a, i128 %b) { + ret double %a +} + +%Struct16 = type { i32, i32, i32, i32 } + +; byval / byref struct pointer parameters encode as a single pointer register +; slot, the same as a plain pointer in the same address space. byval describes +; a caller-side stack copy and byref describes a pointer handed through +; unchanged; neither changes the callee's register footprint (one pointer), +; so both collapse to "i" (for 32-bit AS) or "l" (for 64-bit AS). Compare +; with the plain-pointer encodings in @ptr_addrspaces. +define void @byval_struct_private(ptr addrspace(5) byval(%Struct16) %p) { + ret void +} + +define void @byref_struct_constant(ptr addrspace(4) byref(%Struct16) %p) { + ret void +} + +; Indirect-call type IDs are derived from the call instruction's FunctionType +; using the same rules, so they match the .amdgpu_typeid of an ABI-compatible +; address-taken callee. Duplicate signatures within one function are +; deduplicated (the second void() call below shares "v" with the first and +; yields only one .amdgpu_indirect_call entry; likewise a plain +; ptr addrspace(5) call and a ptr addrspace(5) byval(...) call both encode as +; "vi" and collapse to one entry). +define void @icaller(ptr %f_void, ptr %f_ptrs, ptr %f_vec, ptr %f_small, ptr %f_wide, ptr %f_dup, ptr %f_priv, ptr %f_priv_byval, ptr %f_const, ptr %f_const_byref) { + call void %f_void() + call void %f_ptrs(ptr addrspace(1) null, ptr addrspace(3) null, ptr addrspace(5) null) + %v = call <4 x i32> %f_vec(<2 x i32> zeroinitializer, <4 x float> zeroinitializer, <2 x i64> zeroinitializer) + call void %f_small(i8 signext 0, i16 zeroext 0, i1 false) + %d = call double %f_wide(double 0.0, i128 0) + call void %f_dup() + call void %f_priv(ptr addrspace(5) null) + call void %f_priv_byval(ptr addrspace(5) byval(%Struct16) null) + call void %f_const(ptr addrspace(4) null) + call void %f_const_byref(ptr addrspace(4) byref(%Struct16) null) + ret void +} + +; Take the address of each function so they appear as resource nodes. +define void @taker() { + %p0 = alloca ptr, addrspace(5) + store volatile ptr @void_void, ptr addrspace(5) %p0 + store volatile ptr @i32_i32, ptr addrspace(5) %p0 + store volatile ptr @void_ptr_i32, ptr addrspace(5) %p0 + store volatile ptr @i64_i64_i64, ptr addrspace(5) %p0 + store volatile ptr @float_float, ptr addrspace(5) %p0 + store volatile ptr @ptr_addrspaces, ptr addrspace(5) %p0 + store volatile ptr @vectors, ptr addrspace(5) %p0 + store volatile ptr @promoted_small_ints, ptr addrspace(5) %p0 + store volatile ptr @wide_scalars, ptr addrspace(5) %p0 + store volatile ptr @byval_struct_private, ptr addrspace(5) %p0 + store volatile ptr @byref_struct_constant, ptr addrspace(5) %p0 + store volatile ptr @extern_decl, ptr addrspace(5) %p0 + ret void +} + +define amdgpu_kernel void @kern() { + call void @taker() + call void @icaller(ptr @void_void, ptr @ptr_addrspaces, ptr @vectors, + ptr @promoted_small_ints, ptr @wide_scalars, + ptr @void_void, + ptr @byval_struct_private, ptr @byval_struct_private, + ptr @byref_struct_constant, ptr @byref_struct_constant) + ret void +} + +; CHECK-DAG: R_AMDGPU_ABS64 void_void +; CHECK-DAG: R_AMDGPU_ABS64 i32_i32 +; CHECK-DAG: R_AMDGPU_ABS64 void_ptr_i32 +; CHECK-DAG: R_AMDGPU_ABS64 i64_i64_i64 +; CHECK-DAG: R_AMDGPU_ABS64 float_float +; CHECK-DAG: R_AMDGPU_ABS64 ptr_addrspaces +; CHECK-DAG: R_AMDGPU_ABS64 vectors +; CHECK-DAG: R_AMDGPU_ABS64 promoted_small_ints +; CHECK-DAG: R_AMDGPU_ABS64 wide_scalars +; CHECK-DAG: R_AMDGPU_ABS64 byval_struct_private +; CHECK-DAG: R_AMDGPU_ABS64 byref_struct_constant +; CHECK-DAG: R_AMDGPU_ABS64 extern_decl +; CHECK-DAG: R_AMDGPU_ABS64 icaller +; CHECK-DAG: R_AMDGPU_ABS64 taker +; CHECK-DAG: R_AMDGPU_ABS64 kern + +; ASM-DAG: .amdgpu_info void_void +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "v" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info i32_i32 +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "ii" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info void_ptr_i32 +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "vli" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info i64_i64_i64 +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "lll" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info float_float +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "ii" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info ptr_addrspaces +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "vlii" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info vectors +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "iiiiliiiiiiii" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info promoted_small_ints +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "viii" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info wide_scalars +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "lliiii" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info byval_struct_private +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "vi" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info byref_struct_constant +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_typeid "vl" +; ASM-DAG: .end_amdgpu_info +; COM: Address-taken declaration: only the type-ID appears in its scope, with +; COM: no per-function resource counts (the body lives in another TU). +; ASM-DAG: .amdgpu_info extern_decl +; ASM-DAG: .amdgpu_typeid "vi" +; ASM-DAG: .end_amdgpu_info +; COM: @icaller's indirect call type IDs mirror the @void_void / @ptr_addrspaces / +; COM: @vectors / @promoted_small_ints / @wide_scalars .amdgpu_typeid encodings +; COM: above, proving ABI compatibility. The duplicate void() indirect call is +; COM: deduplicated, so "v" appears only once. The byval/byref pairs dedupe +; COM: with their plain-pointer counterparts: "vi" (AS 5) and "vl" (AS 4) each +; COM: appear once despite two call sites apiece. +; ASM-DAG: .amdgpu_info icaller +; ASM-DAG: .amdgpu_flags 1 +; ASM-DAG: .amdgpu_indirect_call "v" +; ASM-DAG: .amdgpu_indirect_call "vlii" +; ASM-DAG: .amdgpu_indirect_call "iiiiliiiiiiii" +; ASM-DAG: .amdgpu_indirect_call "viii" +; ASM-DAG: .amdgpu_indirect_call "lliiii" +; ASM-DAG: .amdgpu_indirect_call "vi" +; ASM-DAG: .amdgpu_indirect_call "vl" +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info taker +; ASM-DAG: .amdgpu_flags 0 +; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}} +; ASM-DAG: .end_amdgpu_info +; COM: The kernel scope is present but carries no type IDs of its own (kernels +; COM: aren't indirect-call targets). Direct-call edges from the kernel body are +; COM: exercised separately in the callgraph test. +; ASM-DAG: .amdgpu_info kern +; ASM-DAG: .amdgpu_flags {{[0-9]+}} +; ASM-DAG: .end_amdgpu_info diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll index 878f3abf7ccf..0020c2272d23 100644 --- a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll +++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll @@ -1,17 +1,18 @@ -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking < %s | FileCheck -check-prefixes=ASM %s -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --syms - | FileCheck -check-prefixes=ELF %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking < %s | FileCheck -check-prefixes=ASM %s --implicit-check-not=.amdgpu_num_agpr +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --syms --sections - | FileCheck -check-prefixes=ELF %s ; Test that with object linking enabled, external LDS declarations produce -; @abs32@lo relocations, SHN_AMDGPU_LDS symbols, and .amdgpu_lds directives. -; Covers multiple LDS variables with different sizes and alignments (including -; zero-sized dynamic LDS), usage from both kernels and device functions, and +; @abs32@lo relocations, SHN_AMDGPU_LDS symbols, .amdgpu_lds directives, +; and .amdgpu_use edges in the .amdgpu.info section. Covers multiple LDS +; variables with different sizes and alignments (including zero-sized dynamic +; LDS), usage from both kernels and device functions, and ; group_segment_fixed_size = 0 (linker patches via binary patching). @lds_large = external addrspace(3) global [256 x i8], align 16 @lds_small = external addrspace(3) global [128 x i8], align 4 @lds_dynamic = external addrspace(3) global [0 x i8], align 8 -; --- Assembly checks --- +; Instruction-level relocation checks. ; ASM-LABEL: {{^}}device_func: ; ASM: v_add_u32_e32 v{{[0-9]+}}, lds_large@abs32@lo, v{{[0-9]+}} @@ -19,17 +20,49 @@ ; ASM-DAG: s_add_i32 s{{[0-9]+}}, s{{[0-9]+}}, lds_small@abs32@lo ; ASM-DAG: s_add_i32 s{{[0-9]+}}, s{{[0-9]+}}, lds_dynamic@abs32@lo +; .amdgpu.info section with LDS use edges. +; ASM-DAG: .amdgpu_info device_func +; ASM-DAG: .amdgpu_flags {{[0-9]+}} +; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}} +; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}} +; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}} +; ASM-DAG: .amdgpu_use lds_large +; ASM-DAG: .end_amdgpu_info +; ASM-DAG: .amdgpu_info test_kernel +; ASM-DAG: .amdgpu_flags {{[0-9]+}} +; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}} +; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}} +; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}} +; ASM-DAG: .amdgpu_use lds_dynamic +; ASM-DAG: .amdgpu_use lds_small +; ASM-DAG: .amdgpu_call device_func +; ASM-DAG: .end_amdgpu_info + +; SHN_AMDGPU_LDS directives. ; ASM-DAG: .amdgpu_lds lds_large, 256, 16 ; ASM-DAG: .amdgpu_lds lds_small, 128, 4 ; ASM-DAG: .amdgpu_lds lds_dynamic, 0, 8 ; ASM: .group_segment_fixed_size: 0 -; --- ELF checks --- +; .amdgpu.info section exists. +; ELF: Section { +; ELF: Name: .amdgpu.info +; ELF: Type: SHT_PROGBITS +; ELF: Flags [ +; ELF: SHF_EXCLUDE + +; Relocations. ; ELF-DAG: R_AMDGPU_ABS32_LO lds_large ; ELF-DAG: R_AMDGPU_ABS32_LO lds_small ; ELF-DAG: R_AMDGPU_ABS32_LO lds_dynamic +; ELF-DAG: R_AMDGPU_ABS64 device_func +; ELF-DAG: R_AMDGPU_ABS64 test_kernel +; ELF-DAG: R_AMDGPU_ABS64 lds_large +; ELF-DAG: R_AMDGPU_ABS64 lds_small +; ELF-DAG: R_AMDGPU_ABS64 lds_dynamic +; SHN_AMDGPU_LDS symbols. ; ELF-DAG: Name: lds_large ; ELF-DAG: Name: lds_small ; ELF-DAG: Name: lds_dynamic diff --git a/llvm/test/MC/AMDGPU/amdgpu-info-err.s b/llvm/test/MC/AMDGPU/amdgpu-info-err.s new file mode 100644 index 000000000000..22e6d2e29f47 --- /dev/null +++ b/llvm/test/MC/AMDGPU/amdgpu-info-err.s @@ -0,0 +1,43 @@ +// RUN: not llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx900 %s -filetype=null 2>&1 | FileCheck %s + +// Each error case aborts parsing of its enclosing .amdgpu_info block: the +// parser returns on the failing directive, which implicitly exits the block +// (there is no block-open state tracked at the top level), and the next +// test case starts fresh at top level. `.end_amdgpu_info` terminators are +// therefore intentionally omitted -- adding them here would themselves +// become "unknown directive" errors, since `.end_amdgpu_info` is only +// recognised inside the block. + +// Missing function symbol after .amdgpu_info. +.amdgpu_info +// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected symbol name after .amdgpu_info + +// Unknown directive inside a .amdgpu_info block. +.amdgpu_info f_unknown_dir + .amdgpu_bogus 1 +// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: unknown .amdgpu_info directive '.amdgpu_bogus' + +// .amdgpu_use with no resource symbol. +.amdgpu_info f_use_missing + .amdgpu_use +// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected resource symbol for .amdgpu_use + +// .amdgpu_call with no callee symbol. +.amdgpu_info f_call_missing + .amdgpu_call +// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected callee symbol for .amdgpu_call + +// .amdgpu_indirect_call with no type-ID string. +.amdgpu_info f_icall_missing + .amdgpu_indirect_call +// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected type ID string for .amdgpu_indirect_call + +// .amdgpu_typeid with no type-ID string. +.amdgpu_info f_typeid_missing + .amdgpu_typeid +// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected type ID string for .amdgpu_typeid + +// Non-identifier token where a directive or .end_amdgpu_info is expected. +.amdgpu_info f_bad_token + 123 +// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected directive or .end_amdgpu_info diff --git a/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s b/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s new file mode 100644 index 000000000000..d49890eb0517 --- /dev/null +++ b/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s @@ -0,0 +1,126 @@ +// RUN: llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx900 -filetype=asm %s | FileCheck --check-prefix=ASM %s +// RUN: llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx900 -filetype=obj %s | llvm-readobj -r --sections --section-data --string-dump=.amdgpu.strtab - | FileCheck --check-prefix=OBJ %s + +// Test that .amdgpu_info directives round-trip through the assembler (asm and +// object emission) and produce the correct TLV-encoded .amdgpu.info section. + + .text + .globl my_kernel + .p2align 8 + .type my_kernel,@function +my_kernel: + s_endpgm +.Lfunc_end0: + .size my_kernel, .Lfunc_end0-my_kernel + + .globl helper + .p2align 2 + .type helper,@function +helper: + s_setpc_b64 s[30:31] +.Lfunc_end1: + .size helper, .Lfunc_end1-helper + + .globl addr_taken_func + .p2align 2 + .type addr_taken_func,@function +addr_taken_func: + s_setpc_b64 s[30:31] +.Lfunc_end2: + .size addr_taken_func, .Lfunc_end2-addr_taken_func + + .globl extern_func + +// COM: Kernel: flags=7 (KERNEL|VCC|FLAT_SCRATCH), resources, call edge, use +// COM: edge, indirect call, and type ID. Non-zero AGPR to verify conditional +// COM: emission. + .amdgpu_info my_kernel + .amdgpu_flags 7 + .amdgpu_num_sgpr 33 + .amdgpu_num_vgpr 32 + .amdgpu_num_agpr 4 + .amdgpu_private_segment_size 0 + .amdgpu_use lds_var + .amdgpu_call helper + .amdgpu_indirect_call "vi" + .end_amdgpu_info + +// COM: Device function: flags=2 (VCC), call edge to external. Zero AGPR values +// COM: are omitted from the input; the parser defaults them to 0 and the +// COM: emitter skips them. + .amdgpu_info helper + .amdgpu_flags 2 + .amdgpu_num_sgpr 8 + .amdgpu_num_vgpr 10 + .amdgpu_private_segment_size 16 + .amdgpu_call extern_func + .end_amdgpu_info + +// Address-taken function with type ID. Zero AGPR omitted. + .amdgpu_info addr_taken_func + .amdgpu_flags 0 + .amdgpu_num_sgpr 2 + .amdgpu_num_vgpr 4 + .amdgpu_private_segment_size 0 + .amdgpu_typeid "vi" + .end_amdgpu_info + +// ASM: .amdgpu_info my_kernel +// ASM: .amdgpu_flags 7 +// ASM: .amdgpu_num_sgpr 33 +// ASM: .amdgpu_num_vgpr 32 +// ASM: .amdgpu_num_agpr 4 +// ASM: .amdgpu_private_segment_size 0 +// ASM: .amdgpu_use lds_var +// ASM: .amdgpu_call helper +// ASM: .amdgpu_indirect_call "vi" +// ASM: .end_amdgpu_info + +// ASM: .amdgpu_info helper +// ASM: .amdgpu_flags 2 +// ASM: .amdgpu_num_sgpr 8 +// ASM: .amdgpu_num_vgpr 10 +// ASM-NOT: .amdgpu_num_agpr +// ASM: .amdgpu_private_segment_size 16 +// ASM: .amdgpu_call extern_func +// ASM: .end_amdgpu_info + +// ASM: .amdgpu_info addr_taken_func +// ASM: .amdgpu_flags 0 +// ASM: .amdgpu_num_sgpr 2 +// ASM: .amdgpu_num_vgpr 4 +// ASM-NOT: .amdgpu_num_agpr +// ASM: .amdgpu_private_segment_size 0 +// ASM: .amdgpu_typeid "vi" +// ASM: .end_amdgpu_info + +// OBJ: Section { +// OBJ: Name: .amdgpu.info +// OBJ: Type: SHT_PROGBITS +// OBJ: Flags [ +// OBJ: SHF_EXCLUDE +// OBJ: ] +// OBJ: } + +// The string pool backs INFO_INDIRECT_CALL / INFO_TYPEID payloads. It is an +// ELF-convention SHT_STRTAB with a leading null byte at offset 0 and string +// deduplication -- both directives above reference the same "vi" TypeID, so +// it must appear exactly once starting at offset 1. +// OBJ: Section { +// OBJ: Name: .amdgpu.strtab +// OBJ: Type: SHT_STRTAB +// OBJ: Flags [ +// OBJ: SHF_EXCLUDE +// OBJ: ] +// OBJ: } + +// Relocations in .amdgpu.info should reference defined and external symbols. +// OBJ-DAG: R_AMDGPU_ABS64 my_kernel +// OBJ-DAG: R_AMDGPU_ABS64 helper +// OBJ-DAG: R_AMDGPU_ABS64 addr_taken_func +// OBJ-DAG: R_AMDGPU_ABS64 extern_func +// OBJ-DAG: R_AMDGPU_ABS64 lds_var + +// OBJ: String dump of section '.amdgpu.strtab': +// OBJ-NEXT: [{{ +}}1] vi +// OBJ-NOT: ] vi |
