aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/LTO/LTO.cpp
AgeCommit message (Collapse)AuthorFilesLines
13 daysCleanup the LLVM exported symbols namespace (#161240)Nicolai Hähnle1-1/+2
There's a pattern throughout LLVM of cl::opts being exported. That in itself is probably a bit unfortunate, but what's especially bad about it is that a lot of those symbols are in the global namespace. Move them into the llvm namespace. While doing this, I noticed some other variables in the global namespace and moved them as well.
2025-09-22[Remarks] Restructure bitstream remarks to be fully standalone (#156715)Tobias Stadler1-4/+4
Currently there are two serialization modes for bitstream Remarks: standalone and separate. The separate mode splits remark metadata (e.g. the string table) from actual remark data. The metadata is written into the object file by the AsmPrinter, while the remark data is stored in a separate remarks file. This means we can't use bitstream remarks with tools like opt that don't generate an object file. Also, it is confusing to post-process bitstream remarks files, because only the standalone files can be read by llvm-remarkutil. We always need to use dsymutil to convert the separate files to standalone files, which only works for MachO. It is not possible for clang/opt to directly emit bitstream remark files in standalone mode, because the string table can only be serialized after all remarks were emitted. Therefore, this change completely removes the separate serialization mode. Instead, the remark string table is now always written to the end of the remarks file. This requires us to tell the serializer when to finalize remark serialization. This automatically happens when the serializer goes out of scope. However, often the remark file goes out of scope before the serializer is destroyed. To diagnose this, I have added an assert to alert users that they need to explicitly call finalizeLLVMOptimizationRemarks. This change paves the way for further improvements to the remark infrastructure, including more tooling (e.g. #159784), size optimizations for bitstream remarks, and more. Pull Request: https://github.com/llvm/llvm-project/pull/156715
2025-09-05[LLD][COFF] Add more `--time-trace` tags for ThinLTO linking (#156471)Alexandre Ganea1-11/+24
In order to better see what's going on during ThinLTO linking, this PR adds more profile tags when using `--time-trace` on a `lld-link.exe` invocation. After PR, linking `clang.exe`: <img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021" src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910" /> Linking a custom (Unreal Engine game) binary gives a completly different picture, probably because of using Unity files, and the sheer amount of input files (here, providing over 60 GB of .OBJs/.LIBs). <img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048" src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0" />
2025-08-16Reapply "RuntimeLibcalls: Generate table of libcall name lengths (#153… ↵Matt Arsenault1-1/+1
(#153864) This reverts commit 334e9bf2dd01fbbfe785624c0de477b725cde6f2. Check if llvm-nm exists before building the benchmark.
2025-08-15Revert "RuntimeLibcalls: Generate table of libcall name lengths (#153… ↵gulfemsavrun1-1/+1
(#153864) …210)" This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3. Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)" This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de. Revert "TableGen: Emit statically generated hash table for runtime libcalls (#150192)" This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05. Reverted three changes because of a CMake error while building llvm-nm as reported in the following PR: https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073
2025-08-15RuntimeLibcalls: Return StringRef for libcall names (#153209)Matt Arsenault1-1/+1
Does not yet fully propagate this down into the TargetLowering uses, many of which are relying on null checks on the returned value.
2025-07-30LTO: Fix bug, Res was meant to be InputRes.Peter Collingbourne1-1/+1
2025-07-30LTO: Redesign the CFI !aliases metadata.Peter Collingbourne1-10/+33
With the current aliases metadata we lose information about which groups of aliases survive symbol resolution. This causes various problems such as #150075 where symbol resolution breaks the link between alias groups. In this redesign of the aliases metadata, we stop representing the individual aliases in !aliases. Instead, the individual aliases are represented in !cfi.functions in the same way as functions, and the alias groups (i.e. groups of symbols with the same address) are stored in !aliases. At symbol resolution time, we filter out all non-prevailing members of !aliases; the resulting set is used by LowerTypeTests to recreate the aliases. With this change it is now possible for a jump table entry to refer to an alias in one of the ThinLTO object files (e.g. if a function is non-prevailing but its alias is prevailing), so instead of deleting them, rename them with the ".cfi" suffix. Fixes #150070. Fixes #150075. Reviewers: teresajohnson, vitalybuka Reviewed By: vitalybuka Pull Request: https://github.com/llvm/llvm-project/pull/150690
2025-07-30[LTO][NFC] Switch LTO API from output parameter to return value (#151272)Vitaly Buka1-39/+44
2025-06-29[LTO] Remove an unnecessary cast (NFC) (#146275)Kazu Hirata1-1/+1
&I is already of const uint8_t *.
2025-06-27TableGen: Generate enum for runtime libcall implementations (#144973)Matt Arsenault1-2/+8
Work towards separating the ABI existence of libcalls vs. the lowering selection. Set libcall selection through enums, rather than through raw string names.
2025-06-12[DebugInfo][RemoveDIs] Delete debug-info-format flag (#143746)Jeremy Morse1-3/+1
This flag was used to let us incrementally introduce debug records into LLVM, however everything is now using records. It serves no purpose now, so delete it.
2025-06-09[DebugInfo][RemoveDIs] Rip out the UseNewDbgInfoFormat flag (#143207)Jeremy Morse1-3/+1
Start removing debug intrinsics support -- starting with the flag that controls production of their replacement, debug records. This patch removes the command-line-flag and with it the ability to switch back to intrinsics. The module / function / block level "IsNewDbgInfoFormat" flags get hardcoded to true, I'll to incrementally remove things that depend on those flags.
2025-06-03[llvm] annotate interfaces in llvm/LTO for DLL export (#142499)Andrew Rogers1-1/+2
## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/LTO` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). The bulk of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The following manual adjustments were also applied after running IDS on Linux: - Add `LLVM_ABI` to a small number of symbols that require export but are not declared in headers ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang
2025-05-24[LTO] Remove unused includes (NFC) (#141355)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-23[DTLTO][LLVM] Integrated Distributed ThinLTO (DTLTO) (#127749)bd1976bris1-14/+382
This patch adds initial support for Integrated Distributed ThinLTO (DTLTO) in LLVM, which manages distribution internally during the traditional link step. This enables compatibility with any build system that supports in-process ThinLTO. In contrast, existing approaches to distributed ThinLTO, which split the thin-link (--thinlto-index-only), backend compilation, and final link into separate steps, require build system support, e.g. Bazel. This patch implements the core DTLTO mechanism, which enables delegation of ThinLTO backend jobs to an external process (the distributor). The distributor can then manage job distribution through systems like Incredibuild. A generic JSON interface is used to communicate with the distributor, allowing for the creation of new distributors (and thus integration with different distribution systems) without modifying LLVM. Please see llvm/docs/dtlto.rst for more details. RFC: https://discourse.llvm.org/t/rfc-integrated-distributed-thinlto/69641 Design Review: https://github.com/llvm/llvm-project/pull/126654
2025-05-04[llvm] Remove unused local variables (NFC) (#138478)Kazu Hirata1-1/+0
2025-04-28Clean up external users of GlobalValue::getGUID(StringRef) (#129644)Owen Rodley1-6/+8
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 for context. This is a non-functional change which just changes the interface of GlobalValue, in preparation for future functional changes. This part touches a fair few users, so is split out for ease of review. Future changes to the GlobalValue implementation can then be focused purely on that class. This does the following: * Rename GlobalValue::getGUID(StringRef) to getGUIDAssumingExternalLinkage. This is simply making explicit at the callsite what is currently implicit. * Where possible, migrate users to directly calling getGUID on a GlobalValue instance. * Otherwise, where possible, have them call the newly renamed getGUIDAssumingExternalLinkage, to make the assumption explicit. There are a few cases where neither of the above are possible, as the caller saves and reconstructs the necessary information to compute the GUID themselves. We want to migrate these callers eventually, but for this first step we leave them be.
2025-04-12[ThinLTO] Don't convert functions to declarations if `force-import-all` is ↵Shilei Tian1-1/+6
enabled (#134541) On one hand, we intend to force import all functions when the option is enabled. On the other hand, we currently drop definitions of some functions and convert them to declarations, which contradicts this intent. With this PR, functions will no longer be converted to declarations when `force-import-all` is enabled.
2025-03-22[llvm] Use *Set::insert_range (NFC) (#132591)Kazu Hirata1-2/+2
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch uses insert_range with iterator ranges. For each case, I've verified that foos is defined as make_range(foo_begin(), foo_end()) or in a similar manner.
2025-03-19[LTO][WPD] Suppress WPD on a class if the LTO unit doesn't have the ↵Mingming Liu1-2/+3
prevailing definition of this class (#131721) Before this patch, whole program devirtualization is suppressed on a class if any superclass is visible to regular object files, by recording the class GUID in `VisibleToRegularObjSymbols`. This patch suppresses whole program devirtualization on a class if the LTO unit doesn't have the prevailing definition of this class (e.g., the prevailing definition is in a shared library) Implementation summaries: 1. In llvm/lib/LTO/LTO.cpp, `IsVisibleToRegularObj` is updated to look at the global resolution's `IsPrevailing` bit for ThinLTO and regularLTO. 2. In llvm/tools/llvm-lto2/llvm-lto2.cpp, - three command line options are added so `llvm-lto2` can override `Conf.HasWholeProgramVisibility`, `Conf.ValidateAllVtablesHaveTypeInfos` and `Conf.AllVtablesHaveTypeInfos`. The test case is reduced from a small C++ program (main.cc, lib.cc/h pasted below in [1]). To reproduce the program failure without this patch, compile lib.cc into a shared library, and provide it to a ThinLTO build of main.cc (commands are pasted in [2]). [1] * lib.h ``` #include <cstdio> class Derived { public: void dispatch(); virtual void print(); virtual void sum(); }; void Derived::dispatch() { static_cast<Derived*>(this)->print(); static_cast<Derived*>(this)->sum(); } void Derived::sum() { printf("Derived::sum\n"); } __attribute__((noinline)) void* create(int i); __attribute__((noinline)) void* getPtr(int i); ``` * lib.cc ``` #include "lib.h" #include <cstdio> #include <iostream> class Derived2 : public Derived { public: void print() override { printf("DerivedSharedLib\n"); } void sum() override { printf("DerivedSharedLib::sum\n"); } }; void Derived::print() { printf("Derived\n"); } __attribute__((noinline)) void* create(int i) { if (i & 1) return new Derived2(); return new Derived(); } ``` * main.cc ``` cat main.cc #include "lib.h" class DerivedN : public Derived { public: }; __attribute__((noinline)) void* getPtr(int x) { return new DerivedN(); } int main() { Derived*b = static_cast<Derived*>(create(201)); b->dispatch(); delete b; Derived* a = static_cast<Derived*>(getPtr(202)); a->dispatch(); delete a; return 0; } ``` [2] ``` # compile lib.o in a shared library. $ ./bin/clang++ -O2 -fPIC -c lib.cc -o lib.o $ ./bin/clang++ -shared -o libdata.so lib.o # Provide the shared library in `-ldata` $ ./bin/clang++ -v -g -ldata --save-temps -fno-discard-value-names -Wl,-mllvm,-print-before=wholeprogramdevirt -Wl,-mllvm,-wholeprogramdevirt-check=trap -Rpass=wholeprogramdevirt -Wl,--lto-whole-program-visibility -Wl,--lto-validate-all-vtables-have-type-infos -mllvm -disable-icp=true -Wl,-mllvm,-disable-icp=false -flto=thin -fwhole-program-vtables -fno-split-lto-unit -fuse-ld=lld main.cc -L . -o main >/tmp/wholeprogramdevirt.ir 2>&1 # Run the program hits a segmentation fault with `-Wl,-mllvm,-wholeprogramdevirt-check=trap` $ LD_LIBRARY_PATH=. ./main DerivedSharedLib Trace/breakpoint trap (core dumped) ```
2025-03-07[NFC][LTO] Move GUID calculation into CfiFunctionIndex (#130370)Vitaly Buka1-12/+8
Preparation for CFI Index refactoring, which will fix O(N^2) in ThinLTO indexing.
2025-03-06[IR] Store Triple in Module (NFC) (#129868)Nikita Popov1-2/+3
The module currently stores the target triple as a string. This means that any code that wants to actually use the triple first has to instantiate a Triple, which is somewhat expensive. The change in #121652 caused a moderate compile-time regression due to this. While it would be easy enough to work around, I think that architecturally, it makes more sense to store the parsed Triple in the module, so that it can always be directly queried. For this change, I've opted not to add any magic conversions between std::string and Triple for backwards-compatibilty purses, and instead write out needed Triple()s or str()s explicitly. This is because I think a decent number of them should be changed to work on Triple as well, to avoid unnecessary conversions back and forth. The only interesting part in this patch is that the default triple is Triple("") instead of Triple() to preserve existing behavior. The former defaults to using the ELF object format instead of unknown object format. We should fix that as well.
2024-10-16[LLVM] Add `Intrinsic::getDeclarationIfExists` (#112428)Rahul Joshi1-7/+7
Add `Intrinsic::getDeclarationIfExists` to lookup an existing declaration of an intrinsic in a `Module`.
2024-10-09Fix build failure for [CGData][ThinLTO] Global Outlining with Two-CodeGen ↵Kyungwoo Lee1-1/+1
Rounds (#90933)
2024-10-09[CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (#90933)Kyungwoo Lee1-5/+232
This feature is enabled by `-codegen-data-thinlto-two-rounds`, which effectively runs the `-codegen-data-generate` and `-codegen-data-use` in two rounds to enable global outlining with ThinLTO. 1. The first round: Run both optimization + codegen with a scratch output. Before running codegen, we serialize the optimized bitcode modules to a temporary path. 2. From the scratch object files, we merge them into the codegen data. 3. The second round: Read the optimized bitcode modules and start the codegen only this time. Using the codegen data, the machine outliner effectively performs the global outlining. Depends on #90934, #110461 and #110463. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-10-07[ThinLTO][NFC] Refactor ThinBackend (#110461)Kyungwoo Lee1-73/+33
This is a prep for https://github.com/llvm/llvm-project/pull/90933. - Change `ThinBackend` from a function to a type. - Store the parallelism level in the type, which will be used when creating two-codegen round backends that inherit this value. - `ThinBackendProc` is hoisted to `LTO.h` from `LTO.cpp` to provide its body for `ThinBackend`. However, `emitFiles()` is still implemented separately in `LTO.cpp`, distinct from its parent class.
2024-10-07Make WriteIndexesThinBackend multi threaded (#109847)Nuri Amari1-45/+61
We've noticed that for large builds executing thin-link can take on the order of 10s of minutes. We are only using a single thread to write the sharded indices and import files for each input bitcode file. While we need to ensure the index file produced lists modules in a deterministic order, that doesn't prevent us from executing the rest of the work in parallel. In this change we use a thread pool to execute as much of the backend's work as possible in parallel. In local testing on a machine with 80 cores, this change makes a thin-link for ~100,000 input files run in ~2 minutes. Without this change it takes upwards of 10 minutes. --------- Co-authored-by: Nuri Amari <nuriamari@fb.com>
2024-10-04[ThinLTO][NFC] Refactor FileCache (#110463)Kyungwoo Lee1-1/+1
This is a prep for https://github.com/llvm/llvm-project/pull/90933. - Change `FileCache` from a function to a type. - Store the cache directory in the type, which will be used when creating additional caches for two-codegen round runs that inherit this value.
2024-10-03[CGData][ThinLTO][NFC] Prep for two-codegen rounds (#90934)Kyungwoo Lee1-35/+40
This is NFC for https://github.com/llvm/llvm-project/pull/90933. - Create a lambda function, `RunBackends`, to group the backend operations into a single function. - Explicitly pass the `CodeGenOnly` argument to thinBackend, instead of depending on a configuration value. Depends on https://github.com/llvm/llvm-project/pull/90304. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-09-10[LTO] Remove unused includes (NFC) (#108110)Kazu Hirata1-1/+0
clangd reports these as unused headers. My manual inspection agrees with the findings.
2024-09-09Re-apply "[NFCI][LTO][lld] Optimize away symbol copies within LTO global ↵Mingming Liu1-4/+27
resolution in ELF" (#107792) Fix the use-after-free bug and re-apply https://github.com/llvm/llvm-project/pull/106193 * Without the fix, the string referenced by `objSym.Name` could be destroyed even if string saver keeps a copy of the referenced string. This caused use-after-free. * The fix ([latest commit](https://github.com/llvm/llvm-project/pull/107792/commits/9776ed44cfb26172480145aed8f59ba78a6fa2ea)) updates `objSym.Name` to reference (via `StringRef`) the string saver's copy. Test: 1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible with `-DLLVM_USE_SANITIZER=Address` and gone with the fix. 3. Run all tests by following https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes. * Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at `@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined [here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30) * With the fix, the [multi-stage test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh) pass stage2 {asan, ubsan, masan}. This is also the test used by https://lab.llvm.org/buildbot/#/builders/169 **Original commit message** `StringMap<T>` creates a [copy of the string](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMapEntry.h#L55-L58) for entry insertions and intentionally keep copies [since the implementation optimizes string memory usage](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMap.h#L124). On the other hand, linker keeps copies of symbol names [1] in `lld::elf::parseFiles` [2] before invoking `compileBitcodeFiles` [3]. This change proposes to optimize away string copies inside [LTO::GlobalResolutions](https://github.com/llvm/llvm-project/blob/24e791b4164986a1ca7776e3ae0292ef20d20c47/llvm/include/llvm/LTO/LTO.h#L409), which will make LTO indexing more memory efficient for ELF. There are similar opportunities for other (COFF, wasm, MachO) formats. The optimization takes place for lld (ELF) only. For the rest of use cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep copies and use global resolution key for de-duplication. Together with @kazutakahirata's work to make `ComputeCrossModuleImport` more memory efficient, we see a ~20% peak memory usage reduction in a binary where peak memory usage needs to go down. Thanks to the optimization in https://github.com/llvm/llvm-project/commit/329ba523ccbbe68a12434926c92fd9a86494d958, the max (as opposed to the sum) of `ComputeCrossModuleImport` or `GlobalResolution` shows up in peak memory usage. * Regarding correctness, the set of [resolved](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/lib/LTO/LTO.cpp#L739) [per-module symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L188-L191) is a subset of [llvm::lto::InputFile::Symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L120). And bitcode symbol parsing saves symbol name when iterating `obj->symbols` in `BitcodeFile::parse` already. This change updates `BitcodeFile::parseLazy` to keep copies of per-module undefined symbols. * Presumably the undefined symbols in a LTO unit (copied in this patch in linker unique saver) is a small set compared with the set of symbols in global-resolution (copied before this patch), making this a worthwhile trade-off. Benchmarking this change alone shows measurable memory savings across various benchmarks. [1] ELF https://github.com/llvm/llvm-project/blob/1cea5c2138bef3d8fec75508df6dbb858e6e3560/lld/ELF/InputFiles.cpp#L1748 [2] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2863 [3] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2995
2024-09-08Revert "[NFCI][LTO][lld] Optimize away symbol copies within LTO global ↵Mingming Liu1-27/+4
resolution in ELF" (#107788) Reverts llvm/llvm-project#106193 while investigating bot failures https://lab.llvm.org/buildbot/#/builders/169/builds/2989/steps/9/logs/stdio
2024-09-08[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ↵Mingming Liu1-4/+27
ELF (#106193) `StringMap<T>` creates a [copy of the string](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMapEntry.h#L55-L58) for entry insertions and intentionally keep copies [since the implementation optimizes string memory usage](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMap.h#L124). On the other hand, linker keeps copies of symbol names [1] in `lld::elf::parseFiles` [2] before invoking `compileBitcodeFiles` [3]. This change proposes to optimize away string copies inside [LTO::GlobalResolutions](https://github.com/llvm/llvm-project/blob/24e791b4164986a1ca7776e3ae0292ef20d20c47/llvm/include/llvm/LTO/LTO.h#L409), which will make LTO indexing more memory efficient for ELF. There are similar opportunities for other (COFF, wasm, MachO) formats. The optimization takes place for lld (ELF) only. For the rest of use cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep copies and use global resolution key for de-duplication. Together with @kazutakahirata's work to make `ComputeCrossModuleImport` more memory efficient, we see a ~20% peak memory usage reduction in a binary where peak memory usage needs to go down. Thanks to the optimization in https://github.com/llvm/llvm-project/commit/329ba523ccbbe68a12434926c92fd9a86494d958, the max (as opposed to the sum) of `ComputeCrossModuleImport` or `GlobalResolution` shows up in peak memory usage. * Regarding correctness, the set of [resolved](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/lib/LTO/LTO.cpp#L739) [per-module symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L188-L191) is a subset of [llvm::lto::InputFile::Symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L120). And bitcode symbol parsing saves symbol name when iterating `obj->symbols` in `BitcodeFile::parse` already. This change updates `BitcodeFile::parseLazy` to keep copies of per-module undefined symbols. * Presumably the undefined symbols in a LTO unit (copied in this patch in linker unique saver) is a small set compared with the set of symbols in global-resolution (copied before this patch), making this a worthwhile trade-off. Benchmarking this change alone shows measurable memory savings across various benchmarks. [1] ELF https://github.com/llvm/llvm-project/blob/1cea5c2138bef3d8fec75508df6dbb858e6e3560/lld/ELF/InputFiles.cpp#L1748 [2] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2863 [3] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2995
2024-09-06[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding ↵Mingming Liu1-4/+0
synthetic count passes. (#107471) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of https://github.com/llvm/llvm-project/commit/64498c54831bed9cf069e0923b9b73678c6451d8). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version
2024-09-03[ThinLTO] Don't always print ModulesToCompile debugging information (#106769)Nick Sarnie1-2/+2
Nothing went wrong in this case, we just successfully matched a module by identifier. No need to print to std::error like we would for something that should be user-visible. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2024-09-01[LTO] Reduce memory usage for import lists (#106772)Kazu Hirata1-52/+32
This patch reduces the memory usage for import lists by employing memory-efficient data structures. With this patch, an import list for a given destination module is basically DenseSet<uint32_t> with each element indexing into the deduplication table containing tuples of: {SourceModule, GUID, Definition/Declaration} In one of our large applications, the peak memory usage goes down by 9.2% from 6.120GB to 5.555GB during the LTO indexing step. This patch addresses several sources of space inefficiency associated with std::unordered_map: - std::unordered_map<GUID, ImportKind> takes up 16 bytes because of padding even though ImportKind only carries one bit of information. - std::unordered_map uses pointers to elements, both in the hash table proper and for collision chains. - We allocate an instance of std::unordered_map for each {Destination Module, Source Module} pair for which we have at least one import. Most import lists have less than 10 imports, so the metadata like the size of std::unordered_map and the pointer to the hash table costs a lot relative to the actual contents.
2024-08-28[LTO] Introduce new type alias ImportListsTy (NFC) (#106420)Kazu Hirata1-2/+1
The background is as follows. I'm planning to reduce the memory footprint of ThinLTO indexing by changing ImportMapTy, the data structure used for an import list. Once this patch lands, I'm planning to change the type slightly. The new type alias allows us to update the type without touching many places.
2024-08-23[IR] Inroduce ModuleToSummariesForIndexTy (NFC) (#105906)Kazu Hirata1-1/+1
This patch introduces type alias ModuleToSummariesForIndexTy. I'm planning to change the type slightly to allow heterogeneous lookup (that is, std::map<K, V, std::less<>>) in a subsequent patch. The problem is that changing the type affects many places. Using a type alias reduces the impact.
2024-08-22[LTO] Turn ImportMapTy into a proper class (NFC) (#105748)Kazu Hirata1-4/+5
This patch turns type alias ImportMapTy into a proper class to provide a more intuitive interface like: ImportList.addDefinition(...) as opposed to: FunctionImporter::addDefinition(ImportList, ...) Also, this patch requires all non-const accesses to go through addDefinition, maybeAddDeclaration, and addGUID while providing const accesses via: const ImportMapTyImpl &getImportMap() const { return ImportMap; } I realize ImportMapTy may not be the best name as a class (maybe OK as a type alias). I am not renaming ImportMapTy in this patch at least because there are 47 mentions of ImportMapTy under llvm/.
2024-08-21[LTO] Use a range-based for loop (NFC) (#105467)Kazu Hirata1-2/+2
2024-08-21[LTO] Use DenseSet in computeLTOCacheKey (NFC) (#105466)Kazu Hirata1-6/+6
The two instances of std::set are used only for membership checking purposes in computeLTOCacheKey. We do not need std::set's strengths like iterators staying valid or the ability to traverse in a sorted order. This patch changes them to DenseSet. While I am at it, this patch replaces count with contains for slightly increased readability.
2024-08-20[LTO] Teach computeLTOCacheKey to return std::string (NFC) (#105331)Kazu Hirata1-9/+8
Without this patch, computeLTOCacheKey computes SHA1, creates its hexadecimal representation with toHex, which returns std::string, and then copies it to an output parameter of type SmallString. This patch removes the redirection and teaches computeLTOCacheKey to directly return std::string computed by toHex. With the move semantics, no buffer copy should be involved. While I am at it, this patch adds a Twine to concatenate two strings.
2024-07-20Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped ↵Joseph Huber1-8/+7
(#98512)" This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.
2024-07-20ReformatNAKAMURA Takumi1-1/+1
2024-07-20Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped ↵NAKAMURA Takumi1-7/+7
(#98512)" This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610
2024-07-16[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)Joseph Huber1-7/+7
Summary: The LTO pass and LLD linker have logic in them that forces extraction and prevent internalization of needed runtime calls. However, these currently take all RTLibcalls into account, even if the target does not support them. The target opts-out of a libcall if it sets its name to nullptr. This patch pulls this logic out into a class in the header so that LTO / lld can use it to determine if a symbol actually needs to be kept. This is important for targets like AMDGPU that want to be able to use `lld` to perform the final link step, but does not want the overhead of uncalled functions. (This adds like a second to the link time trivially)
2024-07-08Reland "[ThinLTO][Bitcode] Generate import type in bitcode" (#97253)Mingming Liu1-3/+5
https://github.com/llvm/llvm-project/pull/87600 was reverted in order to revert https://github.com/llvm/llvm-project/commit/6262763341fcd71a2b0708cf7485f9abd1d26ba8. Now https://github.com/llvm/llvm-project/pull/95482 is fix forward for https://github.com/llvm/llvm-project/commit/6262763341fcd71a2b0708cf7485f9abd1d26ba8. This patch is a reland for https://github.com/llvm/llvm-project/pull/87600 **Changes on top of original patch** In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of `GVSummaryPtrSet` an `unordered_set` which is more memory efficient when the number of elements is smaller than 128 [1] **Original commit message** For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization. This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly. [1] https://github.com/llvm/llvm-project/blob/393eff4e02e7ab3d234d246a8d6912c8e745e6f9/llvm/lib/Support/SmallPtrSet.cpp#L43
2024-07-03[ThinLTO] Use a set rather than a map to track exported ValueInfos. (#97360)Mingming Liu1-8/+6
https://github.com/llvm/llvm-project/pull/95482 is a reland of https://github.com/llvm/llvm-project/pull/88024. https://github.com/llvm/llvm-project/pull/95482 keeps indexing memory usage reasonable by using unordered_map and doesn't make other changes to originally reviewed code. While discussing possible ways to minimize indexing memory usage, Teresa asked whether I need `ExportSetTy` as a map or a set is sufficient. This PR implements the idea. It uses a set rather than a map to track exposed ValueInfos. Currently, `ExportLists` has two use cases, and neither needs to track a ValueInfo's import/export status. So using a set is sufficient and correct. 1) In both in-process and distributed ThinLTO, it's used to decide if a function or global variable is visible [1] from another module after importing creates additional cross-module references. * If a cross-module call edge is seen today, the callee must be visible to another module without keeping track of its export status already. For instance, this [2] is how callees of direct calls get exported. 2) For in-process ThinLTO [3], it's used to compute lto cache key. * The cache key computation already hashes [4] 'ImportList' , and 'ExportList' is determined by 'ImportList'. So it's fine to not track 'import type' for export list. [1] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1815-L1819 [2] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1783-L1794 [3] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1494-L1496 [4] https://github.com/llvm/llvm-project/blob/b76100e220591fab2bf0a4917b216439f7aa4b09/llvm/lib/LTO/LTO.cpp#L194-L222
2024-06-20Reland "[ThinLTO] Populate declaration import status except for distributed ↵Mingming Liu1-12/+20
ThinLTO under a default-off new option" (#95482) Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`. Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This is a reland of https://github.com/llvm/llvm-project/pull/92718 * `DenseMap` allocates space for a large number of key/value pairs and wastes space when the number of elements are small. * While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2] when the number of elements is small (for example, 3 or 4 elements). The programmer manual [3] also mentions it could waste space. * Experiments show `FunctionsToImportTy.size()` is smaller than 4 for multiple binaries with high indexing ram usage. `unordered_map` grows factor is at most 2 in llvm libc [4] for insert operations. With this change, `ComputeCrossModuleImport` ram increase is smaller than 0.5G on a couple of binaries with high indexing ram usage. A wider range of (pre-release) tests pass. [1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 [2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849 [3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h [4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526 **Original commit message** The goal is to populate `declaration` import status if a new flag `-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases ([better call-graph sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5) or [cross-module auto-init](https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195)) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600 will do this.