aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/CodeGen/BackendUtil.cpp
AgeCommit message (Collapse)AuthorFilesLines
2024-12-13[rtsan][clang] NFC: Move rtsan init to addSanitizers (#119904)Chris Apple1-3/+3
2024-12-13[rtsan][llvm] Remove function pass, only support module pass (#119739)Chris Apple1-8/+2
Most of the other sanitizers are now only module level passes. This moves all functionality into the module pass, and removes the function pass.
2024-12-07[ubsan] Improve lowering of @llvm.allow.ubsan.check (#119013)Vitaly Buka1-7/+6
This fix the case, when single hot inlined callsite, prevent checks for all other. This helps to reduce number of removed checks up to 50% (deppedes on `cutoff-hot` value) . `ScalarOptimizerLateEPCallback` was happening during CGSCC walk, after each inlining, but this is effectively after inlining. Example, order in comments: ``` static void overflow() { // 1. Inline get/set if possible // 2. Simplify // 3. LowerAllowCheckPass set(get() + get()); } void test() { // 4. Inline // 5. Nothing for LowerAllowCheckPass overflow(); } ``` With this patch it will look like: ``` static void overflow() { // 1. Inline get/set if possible // 2. Simplify set(get() + get()); } void test() { // 3. Inline // 4. Simplify overflow(); } // Later, after inliner CGSCC walk complete: // 5. LowerAllowCheckPass for `overflow` // 6. LowerAllowCheckPass for `test` ```
2024-12-06[LLVM][rtsan] Add module pass to initialize rtsan (#118989)Chris Apple1-1/+3
This allows shared libraries instrumented with RTSan to be initialized. This approach directly mirrors the approach in Tsan, Asan and many of the other sanitizers
2024-11-16[CodeGen] Remove unused includes (NFC) (#116459)Kazu Hirata1-5/+0
Identified with misc-include-cleaner.
2024-11-03[PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (#114577)Shilei Tian1-10/+12
2024-11-03[PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs ↵Shilei Tian1-1/+2
(#114547) The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link stage. There are some passes that we want them in non-LTO mode, but not at LTO pre-link stage. The control is missing currently. This PR adds the support. To demonstrate the use, we only enable the internalization pass in non-LTO mode for AMDGPU because having it run in pre-link stage causes some issues.
2024-10-30[llvm] Allow always dropping all llvm.type.test sequencesPaul Kirth1-3/+4
Currently, the `DropTypeTests` parameter only fully works with phi nodes and llvm.assume instructions. However, we'd like CFI to work in conjunction with FatLTO, in so far as the bitcode section should be able to contain the CFI instrumentation, while any incompatible bits are dropped when compiling the object code. To do that, we need to drop the llvm.type.test instructions everywhere, and not just their uses in phi nodes. This patch updates the LowerTypeTest pass so that uses are removed, and replaced with `true` in all cases, and not just in phi nodes. Addressing this will allow us to fix #112053 by modifying the FatLTO pipeline. Reviewers: pcc, nikic Reviewed By: pcc Pull Request: https://github.com/llvm/llvm-project/pull/112787
2024-10-09[CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (#90933)Kyungwoo Lee1-4/+5
This feature is enabled by `-codegen-data-thinlto-two-rounds`, which effectively runs the `-codegen-data-generate` and `-codegen-data-use` in two rounds to enable global outlining with ThinLTO. 1. The first round: Run both optimization + codegen with a scratch output. Before running codegen, we serialize the optimized bitcode modules to a temporary path. 2. From the scratch object files, we merge them into the codegen data. 3. The second round: Read the optimized bitcode modules and start the codegen only this time. Using the codegen data, the machine outliner effectively performs the global outlining. Depends on #90934, #110461 and #110463. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-10-03[CGData][ThinLTO][NFC] Prep for two-codegen rounds (#90934)Kyungwoo Lee1-4/+4
This is NFC for https://github.com/llvm/llvm-project/pull/90933. - Create a lambda function, `RunBackends`, to group the backend operations into a single function. - Explicitly pass the `CodeGenOnly` argument to thinBackend, instead of depending on a configuration value. Depends on https://github.com/llvm/llvm-project/pull/90304. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-09-25Reapply "Deprecate the `-fbasic-block-sections=labels` option." (#110039)Rahman Lavaee1-1/+0
This reapplies commit 1911a50fae8a441b445eb835b98950710d28fc88 with a minor fix in lld/ELF/LTO.cpp which sets Options.BBAddrMap when `--lto-basic-block-sections=labels` is passed.
2024-09-25Revert "Deprecate the `-fbasic-block-sections=labels` option. (#107494)"Kazu Hirata1-0/+1
This reverts commit 1911a50fae8a441b445eb835b98950710d28fc88. Several bots are failing: https://lab.llvm.org/buildbot/#/builders/190/builds/6519 https://lab.llvm.org/buildbot/#/builders/3/builds/5248 https://lab.llvm.org/buildbot/#/builders/18/builds/4463
2024-09-25Deprecate the `-fbasic-block-sections=labels` option. (#107494)Rahman Lavaee1-1/+0
This feature is supported via the newer option `-fbasic-block-address-map`. Using the old option still works by delegating to the newer option, while a warning is printed to show deprecation.
2024-09-24[clang] Add cc1 --output-asm-variant= to set output syntaxFangrui Song1-0/+2
2fcaa549a824efeb56e807fcf750a56bf985296b (2010) added cc1as option `-output-asm-variant` (untested) to set the output syntax. `clang -cc1as -filetype asm -output-asm-variant 1` allows AT&T input and Intel output (`AssemblerDialect` is also used by non-x86 targets). This patch renames the cc1as option (to avoid collision with -o) and makes it available for cc1 to set output syntax. This allows different input & output syntax: ``` echo 'asm("mov $1, %eax");' | clang -xc - -S -o - -Xclang --output-asm-variant=1 ``` Note: `AsmWriterFlavor` (with a misleading name), used to initialize MCAsmInfo::AssemblerDialect, is primarily used for assembly input, not for output. Therefore, `echo 'asm("mov $1, %eax");' | clang -x c - -mllvm --x86-asm-syntax=intel -S -o -`, which achieves a similar goal before Clang 19, was unintended. Close #109157 Pull Request: https://github.com/llvm/llvm-project/pull/109360
2024-09-19[clang] Don't call raw_string_ostream::flush() (NFC)Youngsuk Kim1-1/+0
Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered
2024-09-16[CodeView] Flatten cmd args in frontend for LF_BUILDINFO (#106369)nebulark1-2/+38
2024-09-15[Instrumentation] Move out to Utils (NFC) (#108532)Antonio Frighetto1-1/+0
Utility functions have been moved out to Utils. Minor opportunity to drop the header where not needed.
2024-09-01[LTO] Reduce memory usage for import lists (#106772)Kazu Hirata1-1/+2
This patch reduces the memory usage for import lists by employing memory-efficient data structures. With this patch, an import list for a given destination module is basically DenseSet<uint32_t> with each element indexing into the deduplication table containing tuples of: {SourceModule, GUID, Definition/Declaration} In one of our large applications, the peak memory usage goes down by 9.2% from 6.120GB to 5.555GB during the LTO indexing step. This patch addresses several sources of space inefficiency associated with std::unordered_map: - std::unordered_map<GUID, ImportKind> takes up 16 bytes because of padding even though ImportKind only carries one bit of information. - std::unordered_map uses pointers to elements, both in the hash table proper and for collision chains. - We allocate an instance of std::unordered_map for each {Destination Module, Source Module} pair for which we have at least one import. Most import lists have less than 10 imports, so the metadata like the size of std::unordered_map and the pointer to the hash table costs a lot relative to the actual contents.
2024-08-29[flang][Driver] Add support for -mllvm -print-pipeline-passesTarun Prabhu1-0/+2
The behavior deliberately mimics that of clang. Ideally, -print-pipeline-passes should be a first-class driver option. Notes to this effect have been added in the appropriate places in both flang and clang. --------- Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
2024-08-23[clang][rtsan] Reland realtime sanitizer codegen and driver (#102622) Chris Apple1-0/+8
This reverts commit a1e9b7e646b76bf844e8a9a101ebd27de11992ff This relands commit d010ec6af8162a8ae4e42d2cac5282f83db0ce07 No modifications from the original patch. It was determined that the ubsan build failure was happening even after the revert, some examples: https://lab.llvm.org/buildbot/#/builders/159/builds/4477 https://lab.llvm.org/buildbot/#/builders/159/builds/4478 https://lab.llvm.org/buildbot/#/builders/159/builds/4479
2024-08-22Revert "[clang][rtsan] Introduce realtime sanitizer codegen and drive… ↵Chris Apple1-8/+0
(#105744) …r (#102622)" This reverts commit d010ec6af8162a8ae4e42d2cac5282f83db0ce07. Build failure: https://lab.llvm.org/buildbot/#/builders/159/builds/4466
2024-08-22[clang][rtsan] Introduce realtime sanitizer codegen and driver (#102622)Chris Apple1-0/+8
Introduce the `-fsanitize=realtime` flag in clang driver Plug in the RealtimeSanitizer PassManager pass in Codegen, and attribute a function based on if it has the `[[clang::nonblocking]]` function effect.
2024-08-22[Driver] Add -Wa, options -mmapsyms={default,implicit}Fangrui Song1-0/+1
-Wa,-mmapsyms=implicit enables the alternative mapping symbol scheme discussed at #99718. While not conforming to the current aaelf64 ABI, the option is invaluable for those with full control over their toolchain, no reliance on weird relocatable files, and a strong focus on minimizing both relocatable and executable sizes. The option is discouraged when portability of the relocatable objects is a concern. https://maskray.me/blog/2024-07-21-mapping-symbols-rethinking-for-efficiency elaborates the risk. Pull Request: https://github.com/llvm/llvm-project/pull/104542
2024-08-15[Driver] Make CodeGenOptions name match MCTargetOptions namesFangrui Song1-1/+1
* Initialize `X86RelaxRelocations`. * Fix #96860 test to actually test -Wa,-msse2avx for non-x86.
2024-08-09[LTO] enable `ObjCARCContractPass` only on optimized build (#101114)Peter Rong1-6/+0
\#92331 tried to make `ObjCARCContractPass` by default, but it caused a regression on O0 builds and was reverted. This patch trys to bring that back by: 1. reverts the [revert](https://github.com/llvm/llvm-project/commit/1579e9ca9ce17364963861517fecf13b00fe4d8a). 2. `createObjCARCContractPass` only on optimized builds. Tests are updated to refelect the changes. Specifically, all `O0` tests should not include `ObjCARCContractPass` Signed-off-by: Peter Rong <PeterRong@meta.com>
2024-07-03[Driver] Add -Wa, options --crel and --allow-experimental-crelFangrui Song1-0/+1
The two options are discussed in a few comments around https://github.com/llvm/llvm-project/pull/91280#issuecomment-2099344079 * -Wa,--crel: error "-Wa,--allow-experimental-crel must be specified to use -Wa,--crel..." * -Wa,--allow-experimental-crel: no-op * -Wa,--crel,--allow-experimental-crel: enable CREL in the integrated assembler (#91280) MIPS's little-endian n64 ABI messed up the `r_info` field in relocations. While this could be fixed with CREL, my intention is to avoid complication in assembler/linker. The implementation simply doesn't allow CREL for MIPS. Link: https://discourse.llvm.org/t/rfc-crel-a-compact-relocation-format-for-elf/77600 Pull Request: https://github.com/llvm/llvm-project/pull/97378
2024-07-02[Clang] Enable nsan instrumentation pass (#97359)Alexander Shaposhnikov1-0/+4
Enable nsan instrumentation pass
2024-06-28[clang][CodeGen] Remove unnecessary ShouldLinkFiles conditional (#96951)Jacob Lambert1-1/+1
We have reworked the bitcode linking option to no longer link twice if post-optimization linking is requested. As such, we no longer need to conditionally link bitcodes supplied via -mlink-bitcode-file, as there is no danger of linking them twice
2024-05-31[EntryExitInstrumenter] Move passes out of clang into LLVM default pipelines ↵Egor Pasko1-17/+0
(#92171) Move EntryExitInstrumenter(PostInlining=true) to as late as possible and EntryExitInstrumenter(PostInlining=false) to an early pre-inlining stage (but skip for ThinLTO post-link). This should fix the issues reported in https://github.com/rust-lang/rust/issues/92109 and https://github.com/llvm/llvm-project/issues/52853. These are caused by https://reviews.llvm.org/D97608.
2024-05-24Revert "Run ObjCContractPass in Default Codegen Pipeline (#92331)"Nikita Popov1-0/+6
This reverts commit 8cc8e5d6c6ac9bfc888f3449f7e424678deae8c2. This reverts commit dae55c89835347a353619f506ee5c8f8a2c136a7. Causes major compile-time regressions for unoptimized builds.
2024-05-23Run ObjCContractPass in Default Codegen Pipeline (#92331)Nuri Amari1-6/+0
Prior to this patch, when using -fthinlto-index= the ObjCARCContractPass isn't run prior to CodeGen, and instruction selection fails on IR containing arc intrinsics. This patch is motivated by that usecase. The pass was previously added in various places codegen is performed. This patch adds the pass to the default codegen pipepline, makes sure it bails immediately if no arc intrinsics are found, and removes the adhoc scheduling of the pass. Co-authored-by: Nuri Amari <nuriamari@fb.com>
2024-05-08[clang][CodeGen] Omit pre-opt link when post-opt is link requested (#85672)Jacob Lambert1-10/+2
Currently, when the -relink-builtin-bitcodes-postop option is used we link builtin bitcodes twice: once before optimization, and again after optimization. With this change, we omit the pre-opt linking when the option is set, and we rename the option to the following: -Xclang -mlink-builtin-bitcodes-postopt (-Xclang -mno-link-builtin-bitcodes-postopt) The goal of this change is to reduce compile time. We do lose the theoretical benefits of pre-opt linking, but in practice these are small than the overhead of linking twice. However we may be able to address this in a future patch by adjusting the position of the builtin-bitcode linking pass. Compilations not setting the option are unaffected
2024-05-07[Clang] -fseparate-named-sections option (#91028)Petr Hosek1-0/+1
When set, the compiler will use separate unique sections for global symbols in named special sections (e.g. symbols that are annotated with __attribute__((section(...)))). Doing so enables linker GC to collect unused symbols without having to use a different section per-symbol.
2024-04-19[clang] Add flag to experiment with cold function attributes (#89298)Arthur Eubanks1-22/+35
To be removed and promoted to a proper driver flag if experiments turn out fruitful. For now, this can be experimented with `-mllvm -pgo-cold-func-opt=[optsize|minsize|optnone|default] -mllvm -enable-pgo-force-function-attrs`. Original LLVM patch for this functionality: #69030
2024-04-04[UBSAN] Rename `remove-traps` to `lower-allow-check` (#84853)Vitaly Buka1-3/+3
2024-04-04[clang][CodeGen] Remove SimplifyCFGPass preceding RemoveTrapsPass (#84852)Vitaly Buka1-6/+0
There is no performance difference after switching to `llvm.experimental.hot`.
2024-04-04[UBSAN][HWASAN] Remove redundant flags (#87709)Vitaly Buka1-4/+1
Presense of `cutoff-hot` or `random-skip-rate` should be enough to trigger optimization.
2024-04-02[CodeGen][NFC] Make an opt<> staticVitaly Buka1-2/+2
2024-04-01[ubsan][NFC] Remove recently added `cl::init(false)`Vitaly Buka1-4/+3
Extracted from #84858
2024-03-11[clang] Add optional pass to remove UBSAN traps using PGO (#84214)Vitaly Buka1-0/+21
With #83471 it reduces UBSAN overhead from 44% to 6%. Measured as "Geomean difference" on "test-suite/MultiSource/Benchmarks" with PGO build. On real large server binary we see 95% of code is still instrumented, with 10% -> 1.5% UBSAN overhead improvements. We can pass this test only with subset of UBSAN, so base overhead is smaller. We have followup patches to improve it even further.
2024-03-06[MC] Move CompressDebugSections/RelaxELFRelocations from ↵Fangrui Song1-2/+3
TargetOptions/MCAsmInfo to MCTargetOptions The convention is for such MC-specific options to reside in MCTargetOptions. However, CompressDebugSections/RelaxELFRelocations do not follow the convention: `CompressDebugSections` is defined in both TargetOptions and MCAsmInfo and there is forwarding complexity. Move the option to MCTargetOptions and hereby simplify the code. Rename the misleading RelaxELFRelocations to X86RelaxRelocations. llvm-mc -relax-relocations and llc -x86-relax-relocations can now be unified.
2024-02-28[clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (#83159)Paul Kirth1-16/+16
In addition to being rather hard to follow, there isn't a good reason why FatLTO shouldn't just share the same code for setting module flags for (Thin)LTO. This patch simplifies the logic and makes sure we use set these flags in a consistent way, independent of FatLTO. Additionally, we now test that output in the .llvm.lto section actually matches the output from Full and Thin LTO compilation.
2024-02-12[PGO] Add ability to mark cold functions as optsize/minsize/optnone (#69030)Arthur Eubanks1-6/+12
The performance of cold functions shouldn't matter too much, so if we care about binary sizes, add an option to mark cold functions as optsize/minsize for binary size, or optnone for compile times [1]. Clang patch will be in a future patch. This is intended to replace `shouldOptimizeForSize(Function&, ...)`. We've seen multiple cases where calls to this expensive function, if not careful, can blow up compile times. I will clean up users of that function in a followup patch. Initial version: https://reviews.llvm.org/D149800 [1] https://discourse.llvm.org/t/rfc-new-feature-proposal-de-optimizing-cold-functions-using-pgo-info/56388
2024-02-01[SHT_LLVM_BB_ADDR_MAP] Allow basic-block-sections and labels be used ↵Rahman Lavaee1-0/+1
together by decoupling the handling of the two features. (#74128) Today `-split-machine-functions` and `-fbasic-block-sections={all,list}` cannot be combined with `-basic-block-sections=labels` (the labels option will be ignored). The inconsistency comes from the way basic block address map -- the underlying mechanism for basic block labels -- encodes basic block addresses (https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html). Specifically, basic block offsets are computed relative to the function begin symbol. This relies on functions being contiguous which is not the case for MFS and basic block section binaries. This means Propeller cannot use binary profiles collected from these binaries, which limits the applicability of Propeller for iterative optimization. To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section binaries, we propose modifying the encoding of this section as follows. First let us review the current encoding which emits the address of each function and its number of basic blocks, followed by basic block entries for each basic block. | | | |--|--| | Address of the function | Function Address | | Number of basic blocks in this function | NumBlocks | | BB entry 1 | BB entry 2 | ... | BB entry #NumBlocks To make this work for basic block sections, we treat each basic block section similar to a function, except that basic block sections of the same function must be encapsulated in the same structure so we can map all of them to their single function. We modify the encoding to first emit the number of basic block sections (BB ranges) in the function. Then we emit the address map of each basic block section section as before: the base address of the section, its number of blocks, and BB entries for its basic block. The first section in the BB address map is always the function entry section. | | | |--|--| | Number of sections for this function | NumBBRanges | | Section 1 begin address | BaseAddress[1] | | Number of basic blocks in section 1 | NumBlocks[1] | | BB entries for Section 1 |..................| | Section #NumBBRanges begin address | BaseAddress[NumBBRanges] | | Number of basic blocks in section #NumBBRanges | NumBlocks[NumBBRanges] | | BB entries for Section #NumBBRanges The encoding of basic block entries remains as before with the minor change that each basic block offset is now computed relative to the begin symbol of its containing BB section. This patch adds a new boolean codegen option `-basic-block-address-map`. Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD flag `--lto-basic-block-address-map` are introduced. Analogously, we add a new TargetOption field `BBAddrMap`. This means BB address maps are either generated for all functions in the compiling unit, or for none (depending on `TargetOptions::BBAddrMap`). This patch keeps the functionality of the old `-fbasic-block-sections=labels` option but does not remove it. A subsequent patch will remove the obsolete option. We refactor the `BasicBlockSections` pass by separating the BB address map and BB sections handing to their own functions (named `handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers basic blocks and places them in their assigned sections. `handleBBAddrMap` is invoked after `handleBBSections` (if requested) and only renumbers the blocks. - New tests added: - Two tests basic-block-address-map-with-basic-block-sections.ll and basic-block-address-map-with-mfs.ll to exercise the combination of `-basic-block-address-map` with `-basic-block-sections=list` and '-split-machine-functions`. - A driver sanity test for the `-fbasic-block-address-map` option (basic-block-address-map.c). - An LLD test for testing the `--lto-basic-block-address-map` option. This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`. - Renamed and modified the two existing codegen tests for basic block address map (`basic-block-sections-labels-functions-sections.ll` and `basic-block-sections-labels.ll`) - Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of `SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2 will happen in a separate PR in a few months.
2024-01-26[Driver,CodeGen] Support -mtls-dialect= (#79256)Fangrui Song1-0/+1
GCC supports -mtls-dialect= for several architectures to select TLSDESC. This patch supports the following values * x86: "gnu". "gnu2" (TLSDESC) is not supported yet. * RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915) AArch64 toolchains seem to support TLSDESC from the beginning, and the general dynamic model has poor support. Nobody seems to use the option -mtls-dialect= at all, so we don't bother with it. There also seems very little interest in AArch32's TLSDESC support. TLSDESC does not change IR, but affects object file generation. Without a backend option the option is a no-op for in-process ThinLTO. There seems no motivation to have fine-grained control mixing trad/desc for TLS, so we just pass -mllvm, and don't bother with a modules flag metadata or function attribute. Co-authored-by: Paul Kirth <paulkirth@google.com>
2024-01-23[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061)Paul Kirth1-4/+4
Currently, the UnifiedLTO pipeline seems to have trouble with several LTO features, like SplitLTO units, which means we cannot use important optimizations like Whole Program Devirtualization or security hardening instrumentation like CFI. This patch reverts FatLTO to using distinct pipelines for Full LTO and ThinLTO. It still avoids module cloning, since that was error prone.
2023-12-22[NFC][CLANG] Fix static analyzer bugs about unnecessary object copies with ↵smanna121-1/+1
auto keyword (#75082) Reported by Static Analyzer Tool: In ​EmitAssemblyHelper::​RunOptimizationPipeline(): Using the auto keyword without an & causes the copy of an object of type function. /// List of pass builder callbacks ("CodeGenOptions.h"). std::vector<std::function<void(llvm::PassBuilder &)>> PassBuilderCallbacks;
2023-12-18[clang][fatlto] Don't set ThinLTO module flag with FatLTO (#75079)Paul Kirth1-4/+1
Since FatLTO now uses the UnifiedLTO pipeline, we should not set the ThinLTO module flag to true, since it may cause an assertion failure. See https://github.com/llvm/llvm-project/issues/70703 for context.
2023-12-14[Profile] Add binary profile correlation for code coverage. (#69493)Zequan Wu1-3/+8
## Motivation Since we don't need the metadata sections at runtime, we can somehow offload them from memory at runtime. Initially, I explored [debug info correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113), which is used for PGO with value profiling disabled. However, it currently only works with DWARF and it's be hard to add such artificial debug info for every function in to CodeView which is used on Windows. So, offloading profile metadata sections at runtime seems to be a platform independent option. ## Design The idea is to use new section names for profile name and data sections and mark them as metadata sections. Under this mode, the new sections are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime and can be stripped away as a post-linking step. After the process exits, the generated raw profiles will contains only headers + counters. llvm-profdata can be used correlate raw profiles with the unstripped binary to generate indexed profile. ## Data For chromium base_unittests with code coverage on linux, the binary size overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and the raw profile files size reduce from 128M to 68M (46.9%) ``` $ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data [NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table [ = ] 0 +65% +1.23Ki .relro_padding +62% +1.20Ki [ = ] 0 [Unmapped] +13% +448 +19% +448 .init_array +8.8% +192 [ = ] 0 [ELF Section Headers] +0.0% +136 +0.0% +80 [7 Others] +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.5% +80 +1.2% +64 .plt [ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]] +195% +64.0Mi +194% +64.0Mi TOTAL $ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table +13% +448 +19% +448 .init_array +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.2% +64 +1.2% +64 .plt +2.9% +64 [ = ] 0 [ELF Section Headers] +0.0% +40 +0.0% +40 .data +1.2% +32 +1.2% +32 .got.plt +0.0% +24 +0.0% +8 [5 Others] [ = ] 0 -22.9% -872 [LOAD #5 [RW]] -74.5% -1.44Ki [ = ] 0 [Unmapped] [ = ] 0 -76.5% -1.45Ki .relro_padding +118% +38.8Mi +117% +38.8Mi TOTAL ``` A few things to note: 1. llvm-profdata doesn't support filter raw profiles by binary id yet, so when a raw profile doesn't belongs to the binary being digested by llvm-profdata, merging will fail. Once this is implemented, llvm-profdata should be able to only merge raw profiles with the same binary id as the binary and discard the rest (with mismatched/missing binary id). The workflow I have in mind is to have scripts invoke llvm-profdata to get all binary ids for all raw profiles, and selectively choose the raw pnrofiles with matching binary id and the binary to llvm-profdata for merging. 2. Note: In COFF, currently they are still loaded into memory but not used. I didn't do it in this patch because I noticed that `.lcovmap` and `.lcovfunc` are loaded into memory. A separate patch will address it. 3. This should works with PGO when value profiling is disabled as debug info correlation currently doing, though I haven't tested this yet.
2023-12-10[NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970)Mircea Trofin1-1/+1
Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`. A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.