aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/LTO
AgeCommit message (Collapse)AuthorFilesLines
2024-10-09Fix build failure for [CGData][ThinLTO] Global Outlining with Two-CodeGen ↵Kyungwoo Lee1-1/+1
Rounds (#90933)
2024-10-09[CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (#90933)Kyungwoo Lee3-6/+244
This feature is enabled by `-codegen-data-thinlto-two-rounds`, which effectively runs the `-codegen-data-generate` and `-codegen-data-use` in two rounds to enable global outlining with ThinLTO. 1. The first round: Run both optimization + codegen with a scratch output. Before running codegen, we serialize the optimized bitcode modules to a temporary path. 2. From the scratch object files, we merge them into the codegen data. 3. The second round: Read the optimized bitcode modules and start the codegen only this time. Using the codegen data, the machine outliner effectively performs the global outlining. Depends on #90934, #110461 and #110463. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-10-07[ThinLTO][NFC] Refactor ThinBackend (#110461)Kyungwoo Lee1-73/+33
This is a prep for https://github.com/llvm/llvm-project/pull/90933. - Change `ThinBackend` from a function to a type. - Store the parallelism level in the type, which will be used when creating two-codegen round backends that inherit this value. - `ThinBackendProc` is hoisted to `LTO.h` from `LTO.cpp` to provide its body for `ThinBackend`. However, `emitFiles()` is still implemented separately in `LTO.cpp`, distinct from its parent class.
2024-10-07Make WriteIndexesThinBackend multi threaded (#109847)Nuri Amari2-48/+63
We've noticed that for large builds executing thin-link can take on the order of 10s of minutes. We are only using a single thread to write the sharded indices and import files for each input bitcode file. While we need to ensure the index file produced lists modules in a deterministic order, that doesn't prevent us from executing the rest of the work in parallel. In this change we use a thread pool to execute as much of the backend's work as possible in parallel. In local testing on a machine with 80 cores, this change makes a thin-link for ~100,000 input files run in ~2 minutes. Without this change it takes upwards of 10 minutes. --------- Co-authored-by: Nuri Amari <nuriamari@fb.com>
2024-10-04[ThinLTO][NFC] Refactor FileCache (#110463)Kyungwoo Lee1-1/+1
This is a prep for https://github.com/llvm/llvm-project/pull/90933. - Change `FileCache` from a function to a type. - Store the cache directory in the type, which will be used when creating additional caches for two-codegen round runs that inherit this value.
2024-10-03[CGData][ThinLTO][NFC] Prep for two-codegen rounds (#90934)Kyungwoo Lee2-37/+44
This is NFC for https://github.com/llvm/llvm-project/pull/90933. - Create a lambda function, `RunBackends`, to group the backend operations into a single function. - Explicitly pass the `CodeGenOnly` argument to thinBackend, instead of depending on a configuration value. Depends on https://github.com/llvm/llvm-project/pull/90304. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-09-10[LTO] Remove unused includes (NFC) (#108110)Kazu Hirata1-1/+0
clangd reports these as unused headers. My manual inspection agrees with the findings.
2024-09-09Re-apply "[NFCI][LTO][lld] Optimize away symbol copies within LTO global ↵Mingming Liu1-4/+27
resolution in ELF" (#107792) Fix the use-after-free bug and re-apply https://github.com/llvm/llvm-project/pull/106193 * Without the fix, the string referenced by `objSym.Name` could be destroyed even if string saver keeps a copy of the referenced string. This caused use-after-free. * The fix ([latest commit](https://github.com/llvm/llvm-project/pull/107792/commits/9776ed44cfb26172480145aed8f59ba78a6fa2ea)) updates `objSym.Name` to reference (via `StringRef`) the string saver's copy. Test: 1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible with `-DLLVM_USE_SANITIZER=Address` and gone with the fix. 3. Run all tests by following https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes. * Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at `@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined [here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30) * With the fix, the [multi-stage test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh) pass stage2 {asan, ubsan, masan}. This is also the test used by https://lab.llvm.org/buildbot/#/builders/169 **Original commit message** `StringMap<T>` creates a [copy of the string](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMapEntry.h#L55-L58) for entry insertions and intentionally keep copies [since the implementation optimizes string memory usage](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMap.h#L124). On the other hand, linker keeps copies of symbol names [1] in `lld::elf::parseFiles` [2] before invoking `compileBitcodeFiles` [3]. This change proposes to optimize away string copies inside [LTO::GlobalResolutions](https://github.com/llvm/llvm-project/blob/24e791b4164986a1ca7776e3ae0292ef20d20c47/llvm/include/llvm/LTO/LTO.h#L409), which will make LTO indexing more memory efficient for ELF. There are similar opportunities for other (COFF, wasm, MachO) formats. The optimization takes place for lld (ELF) only. For the rest of use cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep copies and use global resolution key for de-duplication. Together with @kazutakahirata's work to make `ComputeCrossModuleImport` more memory efficient, we see a ~20% peak memory usage reduction in a binary where peak memory usage needs to go down. Thanks to the optimization in https://github.com/llvm/llvm-project/commit/329ba523ccbbe68a12434926c92fd9a86494d958, the max (as opposed to the sum) of `ComputeCrossModuleImport` or `GlobalResolution` shows up in peak memory usage. * Regarding correctness, the set of [resolved](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/lib/LTO/LTO.cpp#L739) [per-module symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L188-L191) is a subset of [llvm::lto::InputFile::Symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L120). And bitcode symbol parsing saves symbol name when iterating `obj->symbols` in `BitcodeFile::parse` already. This change updates `BitcodeFile::parseLazy` to keep copies of per-module undefined symbols. * Presumably the undefined symbols in a LTO unit (copied in this patch in linker unique saver) is a small set compared with the set of symbols in global-resolution (copied before this patch), making this a worthwhile trade-off. Benchmarking this change alone shows measurable memory savings across various benchmarks. [1] ELF https://github.com/llvm/llvm-project/blob/1cea5c2138bef3d8fec75508df6dbb858e6e3560/lld/ELF/InputFiles.cpp#L1748 [2] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2863 [3] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2995
2024-09-08Revert "[NFCI][LTO][lld] Optimize away symbol copies within LTO global ↵Mingming Liu1-27/+4
resolution in ELF" (#107788) Reverts llvm/llvm-project#106193 while investigating bot failures https://lab.llvm.org/buildbot/#/builders/169/builds/2989/steps/9/logs/stdio
2024-09-08[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ↵Mingming Liu1-4/+27
ELF (#106193) `StringMap<T>` creates a [copy of the string](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMapEntry.h#L55-L58) for entry insertions and intentionally keep copies [since the implementation optimizes string memory usage](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMap.h#L124). On the other hand, linker keeps copies of symbol names [1] in `lld::elf::parseFiles` [2] before invoking `compileBitcodeFiles` [3]. This change proposes to optimize away string copies inside [LTO::GlobalResolutions](https://github.com/llvm/llvm-project/blob/24e791b4164986a1ca7776e3ae0292ef20d20c47/llvm/include/llvm/LTO/LTO.h#L409), which will make LTO indexing more memory efficient for ELF. There are similar opportunities for other (COFF, wasm, MachO) formats. The optimization takes place for lld (ELF) only. For the rest of use cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep copies and use global resolution key for de-duplication. Together with @kazutakahirata's work to make `ComputeCrossModuleImport` more memory efficient, we see a ~20% peak memory usage reduction in a binary where peak memory usage needs to go down. Thanks to the optimization in https://github.com/llvm/llvm-project/commit/329ba523ccbbe68a12434926c92fd9a86494d958, the max (as opposed to the sum) of `ComputeCrossModuleImport` or `GlobalResolution` shows up in peak memory usage. * Regarding correctness, the set of [resolved](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/lib/LTO/LTO.cpp#L739) [per-module symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L188-L191) is a subset of [llvm::lto::InputFile::Symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L120). And bitcode symbol parsing saves symbol name when iterating `obj->symbols` in `BitcodeFile::parse` already. This change updates `BitcodeFile::parseLazy` to keep copies of per-module undefined symbols. * Presumably the undefined symbols in a LTO unit (copied in this patch in linker unique saver) is a small set compared with the set of symbols in global-resolution (copied before this patch), making this a worthwhile trade-off. Benchmarking this change alone shows measurable memory savings across various benchmarks. [1] ELF https://github.com/llvm/llvm-project/blob/1cea5c2138bef3d8fec75508df6dbb858e6e3560/lld/ELF/InputFiles.cpp#L1748 [2] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2863 [3] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2995
2024-09-06[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding ↵Mingming Liu4-98/+1
synthetic count passes. (#107471) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of https://github.com/llvm/llvm-project/commit/64498c54831bed9cf069e0923b9b73678c6451d8). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version
2024-09-03[ThinLTO] Don't always print ModulesToCompile debugging information (#106769)Nick Sarnie1-2/+2
Nothing went wrong in this case, we just successfully matched a module by identifier. No need to print to std::error like we would for something that should be user-visible. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2024-09-01[LTO] Reduce memory usage for import lists (#106772)Kazu Hirata1-52/+32
This patch reduces the memory usage for import lists by employing memory-efficient data structures. With this patch, an import list for a given destination module is basically DenseSet<uint32_t> with each element indexing into the deduplication table containing tuples of: {SourceModule, GUID, Definition/Declaration} In one of our large applications, the peak memory usage goes down by 9.2% from 6.120GB to 5.555GB during the LTO indexing step. This patch addresses several sources of space inefficiency associated with std::unordered_map: - std::unordered_map<GUID, ImportKind> takes up 16 bytes because of padding even though ImportKind only carries one bit of information. - std::unordered_map uses pointers to elements, both in the hash table proper and for collision chains. - We allocate an instance of std::unordered_map for each {Destination Module, Source Module} pair for which we have at least one import. Most import lists have less than 10 imports, so the metadata like the size of std::unordered_map and the pointer to the hash table costs a lot relative to the actual contents.
2024-08-28[LTO] Introduce new type alias ImportListsTy (NFC) (#106420)Kazu Hirata2-8/+7
The background is as follows. I'm planning to reduce the memory footprint of ThinLTO indexing by changing ImportMapTy, the data structure used for an import list. Once this patch lands, I'm planning to change the type slightly. The new type alias allows us to update the type without touching many places.
2024-08-23[IR] Inroduce ModuleToSummariesForIndexTy (NFC) (#105906)Kazu Hirata2-3/+3
This patch introduces type alias ModuleToSummariesForIndexTy. I'm planning to change the type slightly to allow heterogeneous lookup (that is, std::map<K, V, std::less<>>) in a subsequent patch. The problem is that changing the type affects many places. Using a type alias reduces the impact.
2024-08-22[LTO] Turn ImportMapTy into a proper class (NFC) (#105748)Kazu Hirata2-6/+6
This patch turns type alias ImportMapTy into a proper class to provide a more intuitive interface like: ImportList.addDefinition(...) as opposed to: FunctionImporter::addDefinition(ImportList, ...) Also, this patch requires all non-const accesses to go through addDefinition, maybeAddDeclaration, and addGUID while providing const accesses via: const ImportMapTyImpl &getImportMap() const { return ImportMap; } I realize ImportMapTy may not be the best name as a class (maybe OK as a type alias). I am not renaming ImportMapTy in this patch at least because there are 47 mentions of ImportMapTy under llvm/.
2024-08-22[LTO] Introduce helper functions to add GUIDs to ImportList (NFC) (#105555)Kazu Hirata1-8/+2
The new helper functions make the intent clearer while hiding implementation details, including how we handle previously added entries. Note that: - If we are adding a GUID as a GlobalValueSummary::Definition, then we override a previously added GlobalValueSummary::Declaration entry for the same GUID. - If we are adding a GUID as a GlobalValueSummary::Declaration, then a previously added GlobalValueSummary::Definition entry for the same GUID takes precedence, and no change is made.
2024-08-21[LTO] Use a range-based for loop (NFC) (#105467)Kazu Hirata1-2/+2
2024-08-21[LTO] Use DenseSet in computeLTOCacheKey (NFC) (#105466)Kazu Hirata1-6/+6
The two instances of std::set are used only for membership checking purposes in computeLTOCacheKey. We do not need std::set's strengths like iterators staying valid or the ability to traverse in a sorted order. This patch changes them to DenseSet. While I am at it, this patch replaces count with contains for slightly increased readability.
2024-08-20[LTO] Teach computeLTOCacheKey to return std::string (NFC) (#105331)Kazu Hirata2-13/+12
Without this patch, computeLTOCacheKey computes SHA1, creates its hexadecimal representation with toHex, which returns std::string, and then copies it to an output parameter of type SmallString. This patch removes the redirection and teaches computeLTOCacheKey to directly return std::string computed by toHex. With the move semantics, no buffer copy should be involved. While I am at it, this patch adds a Twine to concatenate two strings.
2024-08-09[LTO] enable `ObjCARCContractPass` only on optimized build (#101114)Peter Rong2-8/+0
\#92331 tried to make `ObjCARCContractPass` by default, but it caused a regression on O0 builds and was reverted. This patch trys to bring that back by: 1. reverts the [revert](https://github.com/llvm/llvm-project/commit/1579e9ca9ce17364963861517fecf13b00fe4d8a). 2. `createObjCARCContractPass` only on optimized builds. Tests are updated to refelect the changes. Specifically, all `O0` tests should not include `ObjCARCContractPass` Signed-off-by: Peter Rong <PeterRong@meta.com>
2024-07-29[lld][LTO] Teach LTO to print pipeline passes (#101018)macurtis-amd1-0/+10
I found this useful while debugging code generation differences between old and new offloading drivers. No functional change (intended).
2024-07-20Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped ↵Joseph Huber1-8/+7
(#98512)" This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.
2024-07-20ReformatNAKAMURA Takumi1-1/+1
2024-07-20Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped ↵NAKAMURA Takumi1-7/+7
(#98512)" This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610
2024-07-16[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)Joseph Huber1-7/+7
Summary: The LTO pass and LLD linker have logic in them that forces extraction and prevent internalization of needed runtime calls. However, these currently take all RTLibcalls into account, even if the target does not support them. The target opts-out of a libcall if it sets its name to nullptr. This patch pulls this logic out into a class in the header so that LTO / lld can use it to determine if a symbol actually needs to be kept. This is important for targets like AMDGPU that want to be able to use `lld` to perform the final link step, but does not want the overhead of uncalled functions. (This adds like a second to the link time trivially)
2024-07-08Reland "[ThinLTO][Bitcode] Generate import type in bitcode" (#97253)Mingming Liu2-6/+12
https://github.com/llvm/llvm-project/pull/87600 was reverted in order to revert https://github.com/llvm/llvm-project/commit/6262763341fcd71a2b0708cf7485f9abd1d26ba8. Now https://github.com/llvm/llvm-project/pull/95482 is fix forward for https://github.com/llvm/llvm-project/commit/6262763341fcd71a2b0708cf7485f9abd1d26ba8. This patch is a reland for https://github.com/llvm/llvm-project/pull/87600 **Changes on top of original patch** In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of `GVSummaryPtrSet` an `unordered_set` which is more memory efficient when the number of elements is smaller than 128 [1] **Original commit message** For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization. This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly. [1] https://github.com/llvm/llvm-project/blob/393eff4e02e7ab3d234d246a8d6912c8e745e6f9/llvm/lib/Support/SmallPtrSet.cpp#L43
2024-07-03[ThinLTO] Use a set rather than a map to track exported ValueInfos. (#97360)Mingming Liu1-8/+6
https://github.com/llvm/llvm-project/pull/95482 is a reland of https://github.com/llvm/llvm-project/pull/88024. https://github.com/llvm/llvm-project/pull/95482 keeps indexing memory usage reasonable by using unordered_map and doesn't make other changes to originally reviewed code. While discussing possible ways to minimize indexing memory usage, Teresa asked whether I need `ExportSetTy` as a map or a set is sufficient. This PR implements the idea. It uses a set rather than a map to track exposed ValueInfos. Currently, `ExportLists` has two use cases, and neither needs to track a ValueInfo's import/export status. So using a set is sufficient and correct. 1) In both in-process and distributed ThinLTO, it's used to decide if a function or global variable is visible [1] from another module after importing creates additional cross-module references. * If a cross-module call edge is seen today, the callee must be visible to another module without keeping track of its export status already. For instance, this [2] is how callees of direct calls get exported. 2) For in-process ThinLTO [3], it's used to compute lto cache key. * The cache key computation already hashes [4] 'ImportList' , and 'ExportList' is determined by 'ImportList'. So it's fine to not track 'import type' for export list. [1] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1815-L1819 [2] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1783-L1794 [3] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1494-L1496 [4] https://github.com/llvm/llvm-project/blob/b76100e220591fab2bf0a4917b216439f7aa4b09/llvm/lib/LTO/LTO.cpp#L194-L222
2024-06-28[IR] Don't include Module.h in Analysis.h (NFC) (#97023)Nikita Popov1-1/+2
Replace it with a forward declaration instead. Analysis.h is pulled in by all passes, but not all passes need to access the module.
2024-06-26[LTO] Avoid assert fail on failed pass plugin load (#96691)Joel E. Denny1-6/+2
Without this patch, passing -load-pass-plugin=nonexistent.so to llvm-lto2 produces a backtrace because LTOBackend.cpp does not handle the error correctly: ``` Failed to load passes from 'nonexistant.so'. Request ignored. Expected<T> must be checked before access or destruction. Unchecked Expected<T> contained error: Could not load library 'nonexistant.so': nonexistant.so: cannot open shared object file: No such file or directoryPLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. ``` Any tool using `lto::Config::PassPlugins` should suffer similarly. Based on the message "Request ignored" and the continue statement, the intention was apparently to continue on failure to load a plugin. However, no one appears to rely on that behavior now given that it crashes instead, and terminating is consistent with opt.
2024-06-21[TensorSpec] Avoid JSON.h include (NFC)Nikita Popov1-0/+1
Instead forward declare the two classes that are referenced.
2024-06-20Reland "[ThinLTO] Populate declaration import status except for distributed ↵Mingming Liu2-13/+28
ThinLTO under a default-off new option" (#95482) Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`. Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This is a reland of https://github.com/llvm/llvm-project/pull/92718 * `DenseMap` allocates space for a large number of key/value pairs and wastes space when the number of elements are small. * While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2] when the number of elements is small (for example, 3 or 4 elements). The programmer manual [3] also mentions it could waste space. * Experiments show `FunctionsToImportTy.size()` is smaller than 4 for multiple binaries with high indexing ram usage. `unordered_map` grows factor is at most 2 in llvm libc [4] for insert operations. With this change, `ComputeCrossModuleImport` ram increase is smaller than 0.5G on a couple of binaries with high indexing ram usage. A wider range of (pre-release) tests pass. [1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 [2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849 [3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h [4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526 **Original commit message** The goal is to populate `declaration` import status if a new flag `-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases ([better call-graph sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5) or [cross-module auto-init](https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195)) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600 will do this.
2024-06-20[PassManager] Remove some unnecessary includes (NFC) (#96175)Nikita Popov1-0/+2
SmallPtrSet.h and TimeProfiler.h are unused. CommandLine.h is only needed for the UseNewDbgInfoFormat declare, which can be moved to the places that need it.
2024-06-19[NFC][CodeGen] Remove dead ParallelCG.h/.cpp API (#95770)Pierre van Houtryve1-1/+0
LTOBackend inlined it a while ago and now uses a static copy. This API was unused. We can always restore it at some point if it's needed, but right now it's just bloat.
2024-06-12[SystemZ][z/OS] Continue marking text files with OF_Text (#95111)Abhina Sree1-1/+1
Text files should be opened with OF_Text to have the correct encoding.
2024-06-05Revert "Reland "[ThinLTO] Populate declaration import status except for ↵Mingming Liu2-28/+13
distributed ThinLTO under a default-off new option" (#92718) (#94503) This reverts commit e33db249b53fb70dce62db3ebd82d42239bd1d9d. The change from *set to *map increases memory usage, and caused indexing OOM in some applications. Need to profile offline to bring the memory usage down.
2024-06-05Revert "[ThinLTO][Bitcode] Generate import type in bitcode (#87600)" (#94502)Mingming Liu2-12/+6
This reverts commit 6262763341fcd71a2b0708cf7485f9abd1d26ba8, to prepare for the revert of https://github.com/llvm/llvm-project/pull/92718. https://github.com/llvm/llvm-project/pull/92718 causes LTO indexing OOM in some applications.
2024-05-24Revert "Run ObjCContractPass in Default Codegen Pipeline (#92331)"Nikita Popov2-0/+8
This reverts commit 8cc8e5d6c6ac9bfc888f3449f7e424678deae8c2. This reverts commit dae55c89835347a353619f506ee5c8f8a2c136a7. Causes major compile-time regressions for unoptimized builds.
2024-05-23Run ObjCContractPass in Default Codegen Pipeline (#92331)Nuri Amari2-8/+0
Prior to this patch, when using -fthinlto-index= the ObjCARCContractPass isn't run prior to CodeGen, and instruction selection fails on IR containing arc intrinsics. This patch is motivated by that usecase. The pass was previously added in various places codegen is performed. This patch adds the pass to the default codegen pipepline, makes sure it bails immediately if no arc intrinsics are found, and removes the adhoc scheduling of the pass. Co-authored-by: Nuri Amari <nuriamari@fb.com>
2024-05-22[ThinLTO][Bitcode] Generate import type in bitcode (#87600)Mingming Liu2-6/+12
For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization. This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly.
2024-05-20Reland "[ThinLTO] Populate declaration import status except for distributed ↵Mingming Liu2-13/+28
ThinLTO under a default-off new option" (#92718) The original PR is reviewed in https://github.com/llvm/llvm-project/pull/88024, and this PR adds one line (https://github.com/llvm/llvm-project/pull/92718/commits/b9f04d199dec4f3c221d981dcb91e55298d0693f) to fix test Limit to one thread for in-process ThinLTO to test `LLVM_DEBUG` log. - This should fix build bot failure like https://lab.llvm.org/buildbot/#/builders/259/builds/4727 and https://lab.llvm.org/buildbot/#/builders/9/builds/43876 - I could repro the failure and see interleaved log messages by using `-thinlto-threads=all` **Original Commit Message:** The goal is to populate `declaration` import status if a new flag `-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases ([better call-graph sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5) or [cross-module auto-init](https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195)) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600 will do this.
2024-05-19[llvm] Use SmallString::str (NFC) (#92712)Kazu Hirata1-3/+2
2024-05-19Revert "[ThinLTO] Populate declaration import status except for distributed ↵Mingming Liu2-28/+13
ThinLTO under a default-off new option" (#92715) Reverts llvm/llvm-project#88024 Build bot failures (https://lab.llvm.org/buildbot/#/builders/259/builds/4727 and https://lab.llvm.org/buildbot/#/builders/9/builds/43876)
2024-05-19[ThinLTO] Populate declaration import status except for distributed ThinLTO ↵Mingming Liu2-13/+28
under a default-off new option (#88024) The goal is to populate `declaration` import status if a new flag`-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases (better call-graph sort and cross-module auto-init) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600 will do this. [1] https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5 [2] https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195
2024-05-18[ThinLTO]Sort imported GUIDs before cache key update (#92622)Mingming Liu1-1/+7
Add 'sort' here since it's helpful when container type changes (for example, https://github.com/llvm/llvm-project/pull/88024 wants to change container type from `unordered_set` to `DenseMap) @MaskRay points out `std::` doesn't randomize the iteration order of `unordered_{set,map}`, and the iteration order for single build is deterministic.
2024-05-08[llvm] Use StringRef::operator== instead of StringRef::equals (NFC) (#91441)Kazu Hirata1-1/+1
I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 70 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".
2024-04-30[LTO] Reset DiscardValueNames in optimize(). (#78705)Florian Hahn1-0/+3
libLTO parses options late, so at the moment the option is ignored. To fix that, re-set it in optimize(), as at this point the options have been parsed. When LTOCodeGenerator's constructor executes, the options haven't been parsed by the linker to libLTO yet. Note that we keep the value name of `%add = add..` because when the module is imported, DiscardValueNames is still set to false (the default when building with assertions). I tried to improve this in libLTO, but I am not sure if there's a suitable callback when all options have been set. PR: https://github.com/llvm/llvm-project/pull/78705
2024-04-26[LTO] Remove extraneous ArrayRef (NFC) (#90306)Kazu Hirata1-2/+2
We don't need to explicitly create these instances of ArrayRef because Hasher::update takes ArrayRef, and ArrayRef can be implicitly constructed from C arrays.
2024-04-22[LTO] Allow target-specific module splittting (#83128)Pierre van Houtryve1-4/+9
Allow targets to implement custom module splitting logic for --lto-partitions, see #89245 https://discourse.llvm.org/t/rfc-lto-target-specific-module-splittting/77252
2024-03-22[RemoveDIs] Load into new debug info format by default in llvm-lto and ↵Orlando Cazalet-Hyams1-1/+3
llvm-lto2 (#86271) Directly load all bitcode into the new debug info format in `llvm-lto` and `llvm-lto2`. This means that new-mode bitcode no longer round-trips back to old-mode after parsing, and that old-mode bitcode gets auto-upgraded to new-mode debug info (which is the current in-memory default in LLVM).