aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/MC/MCAssembler.cpp
AgeCommit message (Collapse)AuthorFilesLines
2024-06-09[MC] Remove the last MCFragment::getPrevNode use. NFCFangrui Song1-2/+4
2024-06-09[MC] Relax fragments eagerlyFangrui Song1-75/+42
Lazy relaxation caused hash table lookups (`getFragmentOffset`) and complex use/compute interdependencies. Some expressions involding forward declared symbols (e.g. `subsection-if.s`) cannot be computed. Recursion detection requires complex `IsBeingLaidOut` (https://reviews.llvm.org/D79570). D76114's `invalidateFragmentsFrom` makes lazy relaxation even less useful. Switch to eager relaxation to greatly simplify code and resolve these issues. This change also removes a `getPrevNode` use, which makes it more feasible to replace the fragment representation, which might yield a large peak RSS win. Minor downsides: The number of section relaxations may increase (offset by avoiding the hash table lookup). For relax-recompute-align.s, the computed layout is not optimal.
2024-01-18Revert "[BOLT] Fix unconditional output of boltedcollection in merge-fdata ↵Amir Ayupov1-81/+37
(#78653)" This reverts commit 82bc33ea3f1a539be50ed46919dc53fc6b685da9. Accidentally pushed unrelated changes.
2024-01-18[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)Amir Ayupov1-37/+81
Fix the bug where merge-fdata unconditionally outputs boltedcollection line, regardless of whether input files have it set. Test Plan: Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this fix.
2024-01-09[LoongArch] Support R_LARCH_{ADD,SUB}_ULEB128 for .uleb128 and force relocs ↵Jinyang He1-1/+5
when sym is not in section (#76433) 1, Follow RISCV 1df5ea29 to support generates relocs for .uleb128 which can not be folded. Unlike RISCV, the located content of LoongArch should be zero. LoongArch fixup uleb128 value by in-place addition and subtraction reloc types named R_LARCH_{ADD,SUB}_ULEB128. The located content can affect the result and R_LARCH_ADD_ULEB128 has enough info to represent the first symbol value, so it needs to be set to zero. 2, Force relocs if sym is not in section so that it can emit relocs for external symbol. Fixes: https://github.com/llvm/llvm-project/pull/72960#issuecomment-1866844679
2023-12-07[RISCV][MC] Pass MCSubtargetInfo down to shouldForceRelocation and ↵Craig Topper1-10/+12
evaluateTargetFixup. (#73721) Instead of using the STI stored in RISCVAsmBackend, try to get it from the MCFragment. This addresses the issue raised here https://discourse.llvm.org/t/possible-problem-related-to-subtarget-usage/75283
2023-11-09[RISCV] Support R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 for .uleb128 directivesFangrui Song1-8/+26
For a label difference like `.uleb128 A-B`, MC folds A-B even if A and B are separated by a RISC-V linker-relaxable instruction. This incorrect behavior is currently abused by DWARF v5 .debug_loclists/.debug_rnglists (DW_LLE_offset_pair/DW_RLE_offset_pair entry kinds) implemented in Clang/LLVM (see https://github.com/ClangBuiltLinux/linux/issues/1719 for an instance). https://github.com/riscv-non-isa/riscv-elf-psabi-doc/commit/96d6e190e9fc04a8517f9ff7fb9aed3e9876cbd6 defined R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128. This patch generates such a pair of relocations to represent A-B that should not be folded. GNU assembler computes the directive size by ignoring shrinkable section content, therefore after linking the value of A-B cannot use more bytes than the reserved number (`final size of uleb128 value at offset ... exceeds available space`). We make the same assumption. ``` w1: call foo w2: .space 120 w3: .uleb128 w2-w1 # 1 byte, 0x08 .uleb128 w3-w1 # 2 bytes, 0x80 0x01 ``` We do not conservatively reserve 10 bytes (maximum size of an uleb128 for uint64_t) as that would pessimize DWARF v5 DW_LLE_offset_pair/DW_RLE_offset_pair, nullifying the benefits of introducing R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 relocations. The supported expressions are limited. For example, * non-subtraction `.uleb128 A` is not allowed * `.uleb128 A-B`: report an error unless A and B are both defined and in the same section The new cl::opt `-riscv-uleb128-reloc` can be used to suppress the relocations. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D157657
2023-10-12Use llvm::endianness::{big,little,native} (NFC)Kazu Hirata1-1/+1
Note that llvm::support::endianness has been renamed to llvm::endianness while becoming an enum class as opposed to an enum. This patch replaces support::{big,little,native} with llvm::endianness::{big,little,native}.
2023-10-10Use llvm::endianness (NFC)Kazu Hirata1-1/+1
Now that llvm::support::endianness has been renamed to llvm::endianness, we can use the shorter form. This patch replaces support::endianness with llvm::endianness.
2023-08-09[MC] Use reportError for .uleb128/.sleb128 diagnosticFangrui Song1-2/+5
User errors should use reportError. reportError allows us to continue parsing the file and collect more diagnostics. MC/ELF/leb128-err.s is adapted from MC/RISCV/riscv64-64b-pcrel.s
2023-07-21[RISCV] Allow delayed decision for ADD/SUB relocationsFangrui Song1-1/+8
For a label difference `A-B` in assembly, if A and B are separated by a linker-relaxable instruction, we should emit a pair of ADD/SUB relocations (e.g. R_RISCV_ADD32/R_RISCV_SUB32, R_RISCV_ADD64/R_RISCV_SUB64). However, the decision is made upfront at parsing time with inadequate heuristics (`requiresFixup`). As a result, LLVM integrated assembler incorrectly suppresses R_RISCV_ADD32/R_RISCV_SUB32 for the following code: ``` // Simplified from a workaround https://android-review.googlesource.com/c/platform/art/+/2619609 // Both end and begin are not defined yet. We decide ADD/SUB relocations upfront and don't know they will be needed. .4byte end-begin begin: call foo end: ``` To fix the bug, make two primary changes: * Delete `requiresFixups` and the overridden emitValueImpl (from D103539). This deletion requires accurate evaluateAsAbolute (D153097). * In MCAssembler::evaluateFixup, call handleAddSubRelocations to emit ADD/SUB relocations. However, there is a remaining issue in MCExpr.cpp:AttemptToFoldSymbolOffsetDifference. With MCAsmLayout, we may incorrectly fold A-B even when A and B are separated by a linker-relaxable instruction. This deficiency is acknowledged (see D153097), but was previously bypassed by eagerly emitting ADD/SUB using `requiresFixups`. To address this, we partially reintroduce `canFold` (from D61584, removed by D103539). Some expressions (e.g. .size and .fill) need to take the `MCAsmLayout` code path in AttemptToFoldSymbolOffsetDifference, avoiding relocations (weird, but matching GNU assembler and needed to match user expectation). Switch to evaluateKnownAbsolute to leverage the `InSet` condition. As a bonus, this change allows for the removal of some relocations for the FDE `address_range` field in the .eh_frame section. riscv64-64b-pcrel.s contains the main test. Add a linker relaxable instruction to dwarf-riscv-relocs.ll to test what it intends to test. Merge fixups-relax-diff.ll into fixups-diff.ll. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D155357
2023-06-26[MC] Report location information for MCDwarfCallFrameFragment diagnosticsFangrui Song1-0/+1
2023-06-26[MC] Reject CFI advance_loc separated by a non-private label for Mach-OFangrui Song1-6/+10
Due to Mach-O's .subsections_via_symbols mechanism, non-private labels cannot appear between .cfi_startproc/.cfi_endproc. Compilers do not produce such labels, but hand-written assembly may. Give an error. Unfortunately, emitDwarfAdvanceFrameAddr generated MCExpr doesn't have location informatin. Note: evaluateKnownAbsolute is to force folding A-B to a constant even if A and B are separate by a non-private label. The function is a workaround for some Mach-O assembler issues and should generally be avoided. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D153167
2023-06-22[MC] Detect out of range jumps further than 2^32 bytesDaniel Hoekwater1-1/+1
On AArch64, object files may be greater than 2^32 bytes. If an offset is greater than the max value of a 32-bit unsigned integer, LLVM silently truncates the offset. Instead, make it return an error. Differential Revision: https://reviews.llvm.org/D153494
2023-06-16Revert "[RISCV] relaxDwarfCallFrameFragment: remove unneeded relocations for ↵Volodymyr Sapsai1-7/+6
relaxation" Failing buildbot https://green.lab.llvm.org/green/job/clang-stage1-RA/34684/ This reverts commit 11ebe3d906558d93a607347de472e7718127f409.
2023-06-15[RISCV] relaxDwarfCallFrameFragment: remove unneeded relocations for relaxationFangrui Song1-6/+7
If `evaluateAsAbsolute(Value, Layout.getAssembler())` returns true, we know the address delta is a constant and can suppress relocations (usually SET6/SUB6). While here, replace two evaluateKnownAbsolute calls (subtle; avoid if possible) with evaluateAsAbsolute.
2023-05-07MCDwarfFrameEmitter::EncodeAdvanceLoc: use SmallVectorImpl instead of ↵Fangrui Song1-2/+1
raw_ostream. NFC Similar to 49488490d195591bfc90daef111cd7293f8c80aa. Remove MCDwarfFrameEmitter::EmitAdvanceLoc which is only called once.
2023-05-07[MC] MCDwarfLineAddr::Encode: use SmallVectorImpl instead of raw_ostream. NFCFangrui Song1-3/+2
Similar to D145791: most call sites need a SmallString, but have to provide a raw_svector_ostream wrapper with unneeded abstraction and overhead: raw_ostream::write =(inlinable)=> flush_tied_then_write (unneeded TiedStream check) =(virtual function call)=> raw_svector_ostream::write_impl ==> SmallVector append(ItTy in_start, ItTy in_end) (range; less efficient then push_back). Just use SmallVectorImpl to simplify and optimize code. Unfortunately most call sites use SmallString, so we have to use SmallVectorImpl<char> instead of <uint8_t> to avoid large refactoring.
2023-05-04[MC] Optimize relaxInstruction: remove SmallVector copy. NFCFangrui Song1-11/+4
2023-05-04[MC] registerSymbol: change an output paramter to return valueFangrui Song1-5/+4
2023-04-06[MC] Always encode instruction into SmallVectorAlexis Engelke1-2/+1
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream to encode the instruction into a SmallVector. The raw_ostream however incurs some overhead for the actual encoding. This change allows an MCCodeEmitter to directly emit an instruction into a SmallVector without using a raw_ostream and therefore allow for performance improvments in encoding. A default path that uses existing raw_ostream implementations is provided. Reviewed By: MaskRay, Amir Differential Revision: https://reviews.llvm.org/D145791
2022-07-11Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"spupyrev1-81/+37
This reverts commit 6d0528636ae54fba75938a79ae7a98dfcc949f72.
2022-07-11Rebase: [Facebook] [MC] Introduce NeverAlign fragment typeRafael Auler1-37/+81
Summary: Introduce NeverAlign fragment type. The intended usage of this fragment is to insert it before a pair of macro-op fusion eligible instructions. NeverAlign fragment ensures that the next fragment (first instruction in the pair) does not end at a given alignment boundary by emitting a minimal size nop if necessary. In effect, it ensures that a pair of macro-fusible instructions is not split by a given alignment boundary, which is a precondition for macro-op fusion in modern Intel Cores (64B = cache line size, see Intel Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode Pipeline: Macro-Fusion). This patch introduces functionality used by BOLT when emitting code with MacroFusion alignment already in place. The use case is different from BoundaryAlign and instruction bundling: - BoundaryAlign can be extended to perform the desired alignment for the first instruction in the macro-op fusion pair (D101817). However, this approach has higher overhead due to reliance on relaxation as BoundaryAlign requires in the general case - see https://reviews.llvm.org/D97982#2710638. - Instruction bundling: the intent of NeverAlign fragment is to prevent the first instruction in a pair ending at a given alignment boundary, by inserting at most one minimum size nop. It's OK if either instruction crosses the cache line. Padding both instructions using bundles to not cross the alignment boundary would result in excessive padding. There's no straightforward way to request instruction bundling to avoid a given end alignment for the first instruction in the bundle. LLVM: https://reviews.llvm.org/D97982 Manual rebase conflict history: https://phabricator.intern.facebook.com/D30142613 Test Plan: sandcastle Reviewers: #llvm-bolt Subscribers: phabricatorlinter Differential Revision: https://phabricator.intern.facebook.com/D31361547
2022-06-15[NFC][Alignment] Use Align in MCAlignFragmentGuillaume Chatelet1-2/+2
2022-03-11[MC] Fix letter case of some MCSection member functionsFangrui Song1-2/+2
2022-02-09Cleanup LLVMMC headersserge-sans-paille1-3/+4
There's a few relevant forward declarations in there that may require downstream adding explicit includes: llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h Counting preprocessed lines required to rebuild llvm-project on my setup: before: 1052436830 after: 1049293745 Which is significant and backs up the change in addition to the usual benefits of decreasing coupling between headers and compilation units. Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119244
2021-12-07[macho] add support for emitting macho files with two build version load ↵Alex Lorenz1-0/+3
commands This patch extends LLVM IR to add metadata that can be used to emit macho files with two build version load commands. It utilizes "darwin.target_variant.triple" and "darwin.target_variant.SDK Version" metadata names for that, which will be set by a future patch in clang. MachO uses two build version load commands to represent an object file / binary that is targeting both the macOS target, and the Mac Catalyst target. At runtime, a dynamic library that supports both targets can be loaded from either a native macOS or a Mac Catalyst app on a macOS system. We want to add support to this to upstream to LLVM to be able to build compiler-rt for both targets, to finish the complete support for the Mac Catalyst platform, which is right now targetable by upstream clang, but the compiler-rt bits aren't supported because of the lack of this multiple build version support. Differential Revision: https://reviews.llvm.org/D112189
2021-09-07[MC] Use local MCSubtargetInfo in writeNopsPeter Smith1-6/+11
On some architectures such as Arm and X86 the encoding for a nop may change depending on the subtarget in operation at the time of encoding. This change replaces the per module MCSubtargetInfo retained by the targets AsmBackend in favour of passing through the local MCSubtargetInfo in operation at the time. On Arm using the architectural NOP instruction can have a performance benefit on some implementations. For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to limit the chances of this causing problems in the future. I've not done this for other targets such as X86 as there is more frequent use of the MCSubtargetInfo and it looks to be for stable properties that we would not expect to vary per function. This change required threading STI through MCNopsFragment and MCBoundaryAlignFragment. I've attempted to take into account the in tree experimental backends. Differential Revision: https://reviews.llvm.org/D45962
2021-06-17RISCV: adjust handling of relocation emission for RISCVSaleem Abdulrasool1-60/+13
This re-architects the RISCV relocation handling to bring the implementation closer in line with the implementation in binutils. We would previously aggressively resolve the relocation. With this restructuring, we always will emit a paired relocation for any symbolic difference of the type of S±T[±C] where S and T are labels and C is a constant. GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE` which indicates that a fixup may be expanded into multiple relocations. This is used by the RISCV backend to always emit a paired relocation - either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] + SUB[WIDTH] for a debug info relocation. Irrespective of whether linker relaxation support is enabled, symbolic difference is always emitted as a paired relocation. This change also sinks the target specific behaviour down into the target specific area rather than exposing it to the shared relocation handling. In the process, we also sink the "special" handling for debug information down into the RISCV target. Although this improves the path for the other targets, this is not necessarily entirely ideal either. The changes in the debug info emission could be done through another type of hook as this functionality would be required by any other target which wishes to do linker relaxation. However, as there are no other targets in LLVM which currently do this, this is a reasonable thing to do until such time as the code needs to be shared. Improve the handling of the relocation (and add a reduced test case from the Linux kernel) to ensure that we handle complex expressions for symbolic difference. This ensures that we correct relocate symbols with the adddends normalized and associated with the addition portion of the paired relocation. This change also addresses some review comments from Alex Bradbury about the relocations meant for use in the DWARF CFA being named incorrectly (using ADD6 instead of SET6) in the original change which introduced the relocation type. This resolves the issues with the symbolic difference emission sufficiently to enable building the Linux kernel with clang+IAS+lld (without linker relaxation). Resolves PR50153, PR50156! Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143 Reviewed By: nickdesaulniers, maskray Differential Revision: https://reviews.llvm.org/D103539
2021-01-21MCDwarf: Delete uneeded parameterFangrui Song1-4/+3
And change signature
2020-12-10[CSSPGO] Pseudo probe encoding and emission.Hongtao Yu1-0/+36
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead.  **ELF object emission** The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. **Assembling** Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq *%rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` **Disassembling** We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878
2020-12-10Revert "[CSSPGO] Pseudo probe encoding and emission."Mitch Phillips1-36/+0
This reverts commit b035513c06d1cba2bae8f3e88798334e877523e1. Reason: Broke the ASan buildbots: http://lab.llvm.org:8011/#/builders/5/builds/2269
2020-12-10[CSSPGO] Pseudo probe encoding and emission.Hongtao Yu1-0/+36
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead.  **ELF object emission** The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. **Assembling** Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq *%rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` **Disassembling** We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878
2020-10-29[MC] Fix an assert in MCAssembler::writeSectionData to be aware of errorsFangrui Song1-1/+2
If MCContext has an error, MCAssembler::layout may stop early and some MCFragment's may not finalize. In the Linux kernel, arch/x86/lib/memcpy_64.S could trigger the assert before "x86_64: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S"
2020-09-11[MC] Allow .org directives in SHT_NOBITS sectionsFangrui Song1-0/+2
This is used by kvm-unit-tests and can be trivially supported.
2020-08-03[X86] support .nops directiveJian Cai1-2/+44
Add support of .nops on X86. This addresses llvm.org/PR45788. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D82826
2020-07-09[MC] Simplify the logic of applying fixup for fragments, NFCIShengchen Kan1-36/+45
Replace mutiple `if else` clauses with a `switch` clause and remove redundant checks. Before this patch, we need to add a statement like `if(!isa<MCxxxFragment>(Frag)) ` here each time we add a new kind of `MCEncodedFragment` even if it has no fixups. After this patch, we don't need to do that. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D83366
2020-06-25[MC] Fix PR45805: infinite recursion in assemblerThomas Preud'homme1-0/+4
Give up folding an expression if the fragment of one of the operands would require laying out a fragment already being laid out. This prevents hitting an infinite recursion when a fill size expression refers to a later fragment since computing the offset of that fragment would require laying out the fill fragment and thus computing its size expression. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D79570
2020-04-21[MC][Bugfix] Remove redundant parameter for relaxInstructionShengchen Kan1-2/+2
Summary: Before this patch, `relaxInstruction` takes three arguments, the first argument refers to the instruction before relaxation and the third argument is the output instruction after relaxation. There are two quite strange things: 1) The first argument's type is `const MCInst &`, the third argument's type is `MCInst &`, but they may be aliased to the same variable 2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third argument is a fresh uninitialized `MCInst` even if `relaxInstruction` may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a loop. In this patch, we drop the thrid argument, and let `relaxInstruction` directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon. Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain Reviewed By: Razer6, MaskRay, bcain Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78364
2020-04-17MCObjectWriter.h - remove Endian.h/EndianStream.h/raw_ostream.h includes. NFCSimon Pilgrim1-0/+1
Push these includes down to the the writers that actually need them, a number of which were implicitly relying on the MCObjectWriter.h.
2020-04-15[MC][COFF][ELF] Reject instructions in ↵Fangrui Song1-3/+8
IMAGE_SCN_CNT_UNINITIALIZED_DATA/SHT_NOBITS sections For `.bss; nop`, MC inappropriately calls abort() (via report_fatal_error()) with a message `cannot have fixups in virtual section!` It is a bug to crash for invalid user input. Fix it by erroring out early in EmitInstToData(). Similarly, emitIntValue() in a virtual section (SHT_NOBITS in ELF) can crash with the mssage `non-zero initializer found in section '.bss'` (see D4199) It'd be nice to report the location but so many directives can call emitIntValue() and it is difficult to track every location. Note, COFF does not crash because MCAssembler::writeSectionData() is not called for an IMAGE_SCN_CNT_UNINITIALIZED_DATA section. Note, GNU as' arm64 backend reports ``Error: attempt to store non-zero value in section `.bss'`` for a non-zero .inst but fails to do so for other instructions. We simply reject all instructions, even if the encoding is all zeros. The Mach-O counterpart is D48517 (see `test/MC/MachO/zerofill-text.s`) Reviewed By: rnk, skan Differential Revision: https://reviews.llvm.org/D78138
2020-04-15[MC] Replace MCSection*::getName() with MCSection::getName(). NFCFangrui Song1-5/+2
I plan to use MCSection::getName() in D78138. Having the function in the base class is also convenient for debugging. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D78251
2020-04-15[MC] Rename MCSection*::getSectionName() to getName(). NFCFangrui Song1-1/+1
A pending change will merge MCSection*::getName() to MCSection::getName().
2020-03-17[MC] Recalculate fragment offsets after relaxationJian Cai1-1/+7
Summary: The current relaxation implementation is not correctly adjusting the size and offsets of fragements in one section based on changes in size of another if the layout order of the two happened to be such that the former was visited before the later. Therefore, we need to invalidate the fragments in all sections after each iteration of relaxation, and possibly further relax some of them in the next ieration. This fixes PR#45190. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76114
2020-03-12[X86] Reduce the number of emitted fragments due to branch alignShengchen Kan1-13/+8
Summary: Currently, a BoundaryAlign fragment may be inserted after the branch that needs to be aligned to truncate the current fragment, this fragment is unused at most of time. To avoid that, we can insert a new empty Data fragment instead. Non-relaxable instruction is usually emitted into Data fragment, so the inserted empty Data fragment will be reused at a high possibility. Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: reames, LuoYuanke Subscribers: llvm-commits, dexonsmith, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D75438
2020-03-03Temporarily Revert [X86] Not track size of the boudaryalign fragment during ↵Shengchen Kan1-57/+70
the layout Summary: This reverts commit 2ac19feb1571960b8e1479a451b45ab56da7034e. This commit causes some test cases to run fail when branch is aligned.
2020-03-02Use range-for in MCAssembler [NFC]Philip Reames1-5/+4
2020-03-02[X86] Not track size of the boudaryalign fragment during the layoutShengchen Kan1-70/+57
Summary: Currently the boundaryalign fragment caches its size during the process of layout and then it is relaxed and update the size in each iteration. This behaviour is unnecessary and ugly. Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: MaskRay Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75404
2020-02-27[MC][ARM] Resolve some pcrel fixups at assembly time (PR44929)Hans Wennborg1-2/+4
MC currently does not emit these relocation types, and lld does not handle them. Add FKF_Constant as a work-around of some ARM code after D72197. Eventually we probably should implement these relocation types. By Fangrui Song! Differential revision: https://reviews.llvm.org/D72892
2020-02-26[MC] Pull out a relaxFragment helper [NFC]Philip Reames1-33/+25
Having this as it's own function helps to reduce indentation and allows use of return instead of wiring a value over the switch. A lambda would have also worked, but with slightly deeper nesting.