aboutsummaryrefslogtreecommitdiff
path: root/llvm/test/ThinLTO
AgeCommit message (Collapse)AuthorFilesLines
2024-01-11[MemProf] Handle missing tail call frames (#75823)Teresa Johnson2-0/+271
If tail call optimization was not disabled for the profiled binary, the call contexts will be missing frames for tail calls. Handle this by performing a limited search through tail call edges for the profiled callee when a discontinuity is detected. The search depth is adjustable but defaults to 5. If we are able to identify a short sequence of tail calls, update the graph for those calls. In the case of ThinLTO, synthesize the necessary CallsiteInfos for carrying the cloning information to the backends.
2023-12-31[FuncAttrs] Deduce `noundef` attributes for return values (#76553)Yingwei Zheng14-16/+16
This patch deduces `noundef` attributes for return values. IIUC, a function returns `noundef` values iff all of its return values are guaranteed not to be `undef` or `poison`. Definition of `noundef` from LangRef: ``` noundef This attribute applies to parameters and return values. If the value representation contains any undefined or poison bits, the behavior is undefined. Note that this does not refer to padding introduced by the type’s storage representation. ``` Alive2: https://alive2.llvm.org/ce/z/g8Eis6 Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u |stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang| |--|--|--|--|--|--|--| |+0.01%|+0.01%|-0.01%|+0.01%|+0.03%|-0.04%|+0.01%| The motivation of this patch is to reduce the number of `freeze` insts and enable more optimizations.
2023-12-19Reland the reland "[PGO][GlobalValue][LTO]In ↵Mingming Liu5-44/+44
GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75954) Simplify the compiler-rt test to make it more general for different platforms, and use `*DAG` matchers for lines that may be emitted out-of-order. - The compiler-rt test passed on a Windows machine. Previously name matchers don't work for MSVC mangling (https://lab.llvm.org/buildbot/#/builders/127/builds/59907) - `*DAG` matchers fixed the error in https://lab.llvm.org/buildbot/#/builders/94/builds/17924 This is the second reland and fixed errors caught in first reland (https://github.com/llvm/llvm-project/pull/75860) **Original commit message** Commit fe05193 (phab D156569), IRPGO names uses format `[<filepath>;]<linkage-name>` while prior format is `[<filepath>:<mangled-name>`. The format change would break the use case demonstrated in (updated) `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and `compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp` This patch changes `GlobalValues::getGlobalIdentifer` to use the semicolon. To elaborate on the scenario how things break without this PR 1. IRPGO raw profiles stores (compressed) IRPGO names of functions in one section, and per-function profile data in another section. The [NameRef](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/compiler-rt/include/profile/InstrProfData.inc#L72) field in per-function profile data is the MD5 hash of IRPGO names. 2. When raw profiles are converted to indexed format profiles, the profiled address is [mapped](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/ProfileData/InstrProf.cpp#L876-L885) to the MD5 hash of the callee. 3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names will be [annotated](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L1707) as value profiles, and used to import indirect-call-prom candidates. If the annotated MD5 hash is computed from the new format while import uses the prior format, the callee cannot be imported. * `compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp` is added to have an end-to-end test. * `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` is updated to have better test coverage from another aspect (as runtime tests are more sensitive to the environment and may be skipped by some contributors)
2023-12-18Revert "Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, ↵Mingming Liu5-44/+44
use semicolon as delimiter for local-linkage varibles. "" (#75888) Reverts llvm/llvm-project#75860 - Mangled name mismatch on Windows (https://lab.llvm.org/buildbot/#/builders/127/builds/59907/steps/8/logs/stdio)
2023-12-18Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use ↵Mingming Liu5-44/+44
semicolon as delimiter for local-linkage varibles. " (#75860) Fixed build-bot failures caught by post-submit tests 1) Add the list of command line tools needed by new compiler-rt test into dependency. 2) Use `starts_with` to replace deprecated `startswith`. **Original commit message** Commit fe05193 (phab D156569), IRPGO names uses format `[<filepath>;]<linkage-name>` while prior format is `[<filepath>:<mangled-name>`. The format change would break the use case demonstrated in (updated) `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and `compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp` This patch changes `GlobalValues::getGlobalIdentifer` to use the semicolon. To elaborate on the scenario how things break without this PR 1. IRPGO raw profiles stores (compressed) IRPGO names of functions in one section, and per-function profile data in another section. The [NameRef](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/compiler-rt/include/profile/InstrProfData.inc#L72) field in per-function profile data is the MD5 hash of IRPGO names. 2. When raw profiles are converted to indexed format profiles, the profiled address is [mapped](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/ProfileData/InstrProf.cpp#L876-L885) to the MD5 hash of the callee. 3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names will be [annotated](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L1707) as value profiles, and used to import indirect-call-prom candidates. If the annotated MD5 hash is computed from the new format while import uses the prior format, the callee cannot be imported. * `compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp` is added to have an end-to-end test. * `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` is updated to have better test coverage from another aspect (as runtime tests are more sensitive to the environment and may be skipped by some contributors)
2023-12-18Revert "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use ↵Mingming Liu5-44/+44
semicolon as delimiter for local-linkage varibles." (#75835) Reverts llvm/llvm-project#74008 The compiler-rt test failed due to `llvm-dis` not found (https://lab.llvm.org/buildbot/#/builders/127/builds/59884) Will revert and investigate how to require the proper dependency.
2023-12-18[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon ↵Mingming Liu5-44/+44
as delimiter for local-linkage varibles. (#74008) Commit fe05193 (phab D156569), IRPGO names uses format `[<filepath>;]<linkage-name>` while prior format is `[<filepath>:<mangled-name>`. The format change would break the use case demonstrated in (updated) `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and `compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp` This patch changes `GlobalValues::getGlobalIdentifer` to use the semicolon. To elaborate on the scenario how things break without this PR 1. IRPGO raw profiles stores (compressed) IRPGO names of functions in one section, and per-function profile data in another section. The [NameRef](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/compiler-rt/include/profile/InstrProfData.inc#L72) field in per-function profile data is the MD5 hash of IRPGO names. 2. When raw profiles are converted to indexed format profiles, the profiled address is [mapped](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/ProfileData/InstrProf.cpp#L876-L885) to the MD5 hash of the callee. 3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names will be [annotated](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L1707) as value profiles, and used to import indirect-call-prom candidates. If the annotated MD5 hash is computed from the new format while import uses the prior format, the callee cannot be imported. *`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp` is added to have an end-to-end test. * `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` is updated to have better test coverage from another aspect (as runtime tests are more sensitive to the environment and may be skipped by some contributors)
2023-12-14[ThinLTO] Allow importing based on a workload definition (#74545)Mircea Trofin1-0/+162
An example of a "workload definition" would be "the transitive closure of functions actually called to satisfy a RPC request", i.e. a (typically significantly) smaller subset of the transitive closure (static + possible indirect call targets) of callees. This means this workload definition is a type of flat dynamic profile. Producing one is not in scope - it can be produced offline from traces, or from sample-based profiles, etc. This patch adds awareness to ThinLTO of such a concept. A workload is defined as a root and a list of functions. All function references are by-name (more readable than GUIDs). In the case of aliases, the expectation is the list contains all the alternative names. The workload definitions are presented to the linker as a json file, containing a dictionary. The keys are the roots, the values are the list of functions. The import list for a module defining a root will be the functions listed for it in the profile. Using names this way assumes unique names for internal functions, i.e. clang's `-funique-internal-linkage-names`. Note that the behavior affects the entire module where a root is defined (i.e. different workloads best be defined in different modules), and does not affect modules that don't define roots.
2023-12-06[ThinLTO] Add tail call flag to call edges in summary (#74043)Teresa Johnson2-15/+15
This adds support for a HasTailCall flag on function call edges in the ThinLTO summary. It is intended for use in aiding discovery of missing frames from tail calls in profiled call stacks for MemProf of profiled binaries that did not disable tail call elimination. A follow on change will add the use of this new flag during MemProf context disambiguation. The new flag is encoded in the bitcode along with either the hotness flag from the profile, or the relative block frequency under the -write-relbf-to-summary flag when there is no profile data. Because we now will always have some additional call edge information, I have removed the non-profile function summary record format, and we simply encode the tail call flag along with a hotness type of none when there is no profile information or relative block frequency. The change of record format and name caused most of the test case changes. I have added explicit testing of generation of the new tail call flag into the bitcode and IR assembly format as part of the changes to llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also added round trip testing through assembly and bitcode to llvm/test/Assembler/thinlto-summary.ll.
2023-11-30[MemProf][NFC] Simplify test case (#73979)Teresa Johnson1-136/+6
Removes some unnecessary testing of dump and dot file formats, to simplify updates to this test.
2023-11-30[ThinLTO] Don't mark calloc function dead (#72673)eleviant1-2/+11
Dead store elimination pass may fold malloc + memset calls into a single call to calloc. If calloc is not preserved and is not being called it can be marked dead which results in link error.
2023-11-15[MemProf] Look through alias when applying cloning in ThinLTO backend (#72156)Teresa Johnson1-0/+63
Mirror the handling in ModuleSummaryAnalysis to look through aliases when handling call instructions in the ThinLTO backend MemProf handling. Fixes #72094
2023-11-01[InstCombine] Avoid use of ConstantExpr::getIntegerCast()Nikita Popov1-1/+1
Require that constants are ImmConstant for this transform, as we may otherwise generate constant expressions, which are not necessarily free.
2023-10-24BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreadsMatthias Braun1-1/+1
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that: * Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room. * Spread the difference between hottest/coldest block as much as possible to increase precision. * If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
2023-09-01[LTO] Remove module id from summary indexTeresa Johnson1-3/+3
The module paths string table mapped to both an id sequentially assigned during LTO linking, and the module hash. The former is leftover from before the module hash was added for caching and subsequently replaced use of the module id when renaming promoted symbols (to avoid affects due to link order changes). The sequentially assigned module id was not removed, however, as it was still a convenience when serializing to/from bitcode and assembly. This patch removes the module id from this table, since it isn't strictly needed and can lead to confusion on when it is appropriate to use (e.g. see fix in D156525). It also takes a (likely not significant) amount of overhead. Where an integer module id is needed (e.g. bitcode writing), one is assigned on the fly. There are a couple of test changes since the paths are now sorted alphanumerically when assigning ids on the fly during assembly writing, in order to ensure deterministic behavior. Differential Revision: https://reviews.llvm.org/D156730
2023-09-01[llvm][ThinLTO] Add missing ":" to RUN line in devirtualization testDavid Spickett1-1/+1
2023-08-29[ThinLTO] Mark callers of local ifunc not eligible for importFangrui Song1-0/+79
Fix https://github.com/llvm/llvm-project/issues/58740 The `target_clones` attribute results in ifunc on eligible targets (Linux glibc/Android or FreeBSD). If the function has internal linkage, we will get an internal linkage ifunc. ``` __attribute__((target_clones("popcnt", "default"))) static int foo(int n) { return __builtin_popcount(n); } int use(int n) { return foo(n); } @foo.ifunc = internal ifunc i32 (i32), ptr @foo.resolver define internal nonnull ptr @foo.resolver() comdat { ; local linkage comdat is another issue that should be fixed ... select i1 %.not, ptr @foo.default.1, ptr @foo.popcnt.0 ... } define internal i32 @foo.default.1(i32 noundef %n) ``` ifuncs are not included in module summaries, so LTO doesn't know the local linkage `foo.default.1` referenced by `foo.resolver` should be promoted. If a caller of `foo` (e.g. `use`) is imported, the local linkage `foo.resolver` will be cloned as a definition (IRLinker::shouldLink), leading to linker errors. ``` ld.lld: error: undefined hidden symbol: foo.default.1.llvm.8017227050314953235 >>> referenced by bar.c >>> lto.tmp:(foo.ifunc) ``` As a simple fix, just mark `use` as not eligible for import. Non-local linkage ifuncs do not have the problem, because they are not imported, and not cloned when a caller is imported. --- https://reviews.llvm.org/D82745 contains a more involved fix, though the original bug it intended to fix (https://github.com/llvm/llvm-project/issues/45833) now works. Note: importing ifunc is tricky. If we import an ifunc, we need to make sure the resolver and the implementation are in the translation unit, as required by https://sourceware.org/glibc/wiki/GNU_IFUNC > Requirement (a): Resolver must be defined in the same translation unit as the implementations. This is infeasible if the implementation is changed to available_externally. In addition, the imported ifunc may be referenced by two translation units. This doesn't work with PowerPC32 -msecure-plt (https://maskray.me/blog/2021-01-18-gnu-indirect-function). At the very least, every referencing translation unit needs one extra IRELATIVE dynamic relocation. At least for the local linkage ifunc case, it doesn't have much use outside of `target_clones`, as a global pointer is usually a better replacement. I think ifuncs just have too many pitfalls to design more IR features around it to optimize them. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D158961
2023-08-28[ThinLTO][test] Test prevailing non-exported non-ODR linkonce/odr can be ↵Fangrui Song1-1/+13
internalized To improve the tests added by D151965. Discovered when writing D158949.
2023-07-28[ThinLTO] Use module hash instead of module ID for cache keyNikita Popov3-0/+46
This is a followup to D151165. Instead of using the module ID, use the module hash for sorting the import list. The module hash is what will actually be included in the hash. This has the advantage of being independent of the module order, which is something that Rust relies on. A caveat here is that the test doesn't quite work for linkonce_odr functions, because the function may be imported from two different modules, and the first one on the llvm-lto2 command line gets picked (rather than, say, the prevailing copy). This doesn't really matter for Rust's purposes (because it does not use linkonce_odr linkage), but may still be worth addressing. For now I'm using a variant of the test using internal instead of linkonce_odr functions. Differential Revision: https://reviews.llvm.org/D156525
2023-06-16Reland "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local ↵Vladislav Dzhidzhoev1-7/+5
imported entities (3/7)" Got rid of non-determinism in MetadataLoader.cpp. Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com> Differential Revision: https://reviews.llvm.org/D144004
2023-06-15Revert "Reland "[DebugMetadata][DwarfDebug] Fix DWARF emisson of ↵Vladislav Dzhidzhoev1-5/+7
function-local imported entities (3/7)"" This reverts commit fcc3981626821addc6c77b98006d02030b8ceb7f, since Bitcode-upgrading code doesn't seem to be deterministic.
2023-06-15Reland "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local ↵Vladislav Dzhidzhoev1-7/+5
imported entities (3/7)" Run split-dwarf-local-impor3.ll only on x86_64-linux.
2023-06-15Revert "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local ↵Vladislav Dzhidzhoev1-5/+7
imported entities (3/7)" This reverts commit d80fdc6fc1a6e717af1bcd7a7313e65de433ba85. split-dwarf-local-impor3.ll fails because of an issue with Dwo sections emission on Windows platform.
2023-06-15[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported ↵Vladislav Dzhidzhoev1-7/+5
entities (3/7) RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544 Fixed PR51501 (tests from D112337). 1. Reuse of DISubprogram's 'retainedNodes' to track other function-local entities together with local variables and labels (this patch cares about function-local import while D144006 and D144008 use the same approach for local types and static variables). So, effectively this patch moves ownership of tracking local import from DICompileUnit's 'imports' field to DISubprogram's 'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout is considered unsupported (DwarfDebug would assert on such debug metadata). DICompileUnit's 'imports' field is supposed to track global imported declarations as it does before. This addresses various FIXMEs and simplifies the next part of the patch. 2. Postpone emission of function-local imported entities from `DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`. While in `DwarfDebug::endFunctionImpl()` we do not have all the information about a parent subprogram or a referring subprogram (whether a subprogram inlined or not), so we can't guarantee we emit an imported entity correctly and place it in a proper subprogram tree. So now, we just gather needed details about the import itself and its parent entity (either a Subprogram or a LexicalBlock) during processing in `DwarfDebug::endFunctionImpl()`, but all the real work is done in `DwarfDebug::endModule()` when we have all the required information to make proper emission. Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com> Differential Revision: https://reviews.llvm.org/D144004
2023-06-15Revert "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local ↵Vladislav Dzhidzhoev1-5/+7
imported entities (3/7)" This reverts commit ed578f02cf44a52adde16647150e7421f3ef70f3. Tests llvm/test/DebugInfo/Generic/split-dwarf-local-import*.ll fail when x86_64 target is not registered.
2023-06-15[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported ↵Vladislav Dzhidzhoev1-7/+5
entities (3/7) RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544 Fixed PR51501 (tests from D112337). 1. Reuse of DISubprogram's 'retainedNodes' to track other function-local entities together with local variables and labels (this patch cares about function-local import while D144006 and D144008 use the same approach for local types and static variables). So, effectively this patch moves ownership of tracking local import from DICompileUnit's 'imports' field to DISubprogram's 'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout is considered unsupported (DwarfDebug would assert on such debug metadata). DICompileUnit's 'imports' field is supposed to track global imported declarations as it does before. This addresses various FIXMEs and simplifies the next part of the patch. 2. Postpone emission of function-local imported entities from `DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`. While in `DwarfDebug::endFunctionImpl()` we do not have all the information about a parent subprogram or a referring subprogram (whether a subprogram inlined or not), so we can't guarantee we emit an imported entity correctly and place it in a proper subprogram tree. So now, we just gather needed details about the import itself and its parent entity (either a Subprogram or a LexicalBlock) during processing in `DwarfDebug::endFunctionImpl()`, but all the real work is done in `DwarfDebug::endModule()` when we have all the required information to make proper emission. Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com> Differential Revision: https://reviews.llvm.org/D144004
2023-06-15Revert "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local ↵Vladislav Dzhidzhoev1-5/+7
imported entities (3/7)" This reverts commit d04452d54829cd7af5b43d670325ffa755ab0030 since test llvm-project/llvm/test/Bitcode/DIImportedEntity_backward.ll is broken.
2023-06-15[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported ↵Vladislav Dzhidzhoev1-7/+5
entities (3/7) RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544 Fixed PR51501 (tests from D112337). 1. Reuse of DISubprogram's 'retainedNodes' to track other function-local entities together with local variables and labels (this patch cares about function-local import while D144006 and D144008 use the same approach for local types and static variables). So, effectively this patch moves ownership of tracking local import from DICompileUnit's 'imports' field to DISubprogram's 'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout is considered unsupported (DwarfDebug would assert on such debug metadata). DICompileUnit's 'imports' field is supposed to track global imported declarations as it does before. This addresses various FIXMEs and simplifies the next part of the patch. 2. Postpone emission of function-local imported entities from `DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`. While in `DwarfDebug::endFunctionImpl()` we do not have all the information about a parent subprogram or a referring subprogram (whether a subprogram inlined or not), so we can't guarantee we emit an imported entity correctly and place it in a proper subprogram tree. So now, we just gather needed details about the import itself and its parent entity (either a Subprogram or a LexicalBlock) during processing in `DwarfDebug::endFunctionImpl()`, but all the real work is done in `DwarfDebug::endModule()` when we have all the required information to make proper emission. Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com> Differential Revision: https://reviews.llvm.org/D144004
2023-06-02[ThinLTO] Fix internalization decisions for weak/linkonce ODRTeresa Johnson3-17/+59
This fixes a runtime error that occurred due to incorrect internalization of linkonce_odr functions where function pointer equality was broken. This was hit because the prevailing copy was in a native object, so the IR copies were not exported, and the existing code internalized all of the IR copies. It could be fixed by guarding this internalization on whether the defs are (local_)unnamed_addr, meaning that their address is not significant (which we have in the summary currently for linkonce_odr via the CanAutoHide flag). Or we can propagate reference attributes as we do when determining whether a global variable is read or write-only (reference edges are annotated with whether they are read-only, write-only, or neither, and taking the address of a function would result in a reference edge to the function that is not read or write-only). However, this exposed a larger issue with the internalization handling. Looking at test cases, it appears the intent is to internalize when there is a single definition of a linkonce/weak ODR symbol (that isn't exported). This makes sense in the case of functions, because the inliner can apply its last call to static heuristic when appropriate. In the case where there is no prevailing copy in IR, internalizing all of the IR copies of a linkonce_odr, even if legal, just increases binary size. In that case it is better to fall back to the normal handling of converting all non-prevailing copies to available_externally so that they are eliminated after inlining. In the case of variables, the existing code was attempting to internalize the non-exported linkonce/weak ODR variables if they were read or write-only. While this is legal (we propagate reference attributes to determine this information), we don't even need to internalize these here as there is later separate handling that internalizes read and write-only variables when we process the module at the start of the ThinLTO backend (processGlobalForThinLTO). Instead, we can also internalize any non-exported variable when there is only one (IR) definition, which is prevailing. And in that case, we don't need to require that it is read or write-only, since we are guaranteed that all uses must use that single definition. In the new LTO API, if there are multiple defs of a linkonce or weak ODR it will be marked exported, but it isn't clear that this will always be true for the legacy LTO API. Therefore, require that there is only a single (non-local) def, and that it is prevailing. The test cases changes are both to reflect the change in the handling of linkonce_odr IR copies where the prevailing def is not in IR (the main correctness bug fix here), and to reflect the more aggressive internalization of variables when there is only a single def, it is in IR, and not exported. I've also added some additional testing via the new LTO API. Differential Revision: https://reviews.llvm.org/D151965
2023-05-22[ThinLTO] Make the cache key independent of the module identifier pathsArgyrios Kyrtzidis1-0/+27
Otherwise there are cache misses just from changing the name of a path, even though the input modules did not change. rdar://109672225 Differential Revision: https://reviews.llvm.org/D151165
2023-05-17[NFC][Py Reformat] Reformat lit.local.cfg python files in llvmTobias Hieta1-2/+2
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762
2023-05-11[WPD] Update llvm.public.type.test after importing functionsTeresa Johnson2-11/+58
I noticed that we are converting llvm.public.type.test to regular llvm.type.test too early, and thus not updating those in imported functions. This would result in losing out on WPD opportunities. Move the update to after function importing, and improve test to cover this case. Differential Revision: https://reviews.llvm.org/D150326
2023-05-10[MemProf] Update hot/cold information after importingTeresa Johnson1-11/+56
The support added by D149215 to remove memprof metadata and attributes if we don't link with an allocator supporting hot/cold operator new interfaces did not update imported code. Move the update handling later in the ThinLTO backend to just after importing, and update the test to check this case. Differential Revision: https://reviews.llvm.org/D150295
2023-05-08[MemProf] Control availability of hot/cold operator new from LTO linkTeresa Johnson8-0/+69
Adds an LTO option to indicate that whether we are linking with an allocator that supports hot/cold operator new interfaces. If not, at the start of the LTO backends any existing memprof hot/cold attributes are removed from the IR, and we also remove memprof metadata so that post-LTO inlining doesn't add any new attributes. This is done via setting a new flag in the module summary index. It is important to communicate via the index to the LTO backends so that distributed ThinLTO handles this correctly, as they are invoked by separate clang processes and the combined index is how we communicate information from the LTO link. Specifically, for distributed ThinLTO the LTO related processes look like: ``` # Thin link: $ lld --thinlto-index-only obj1.o ... objN.o -llib ... # ThinLTO backends: $ clang -x ir obj1.o -fthinlto-index=obj1.o.thinlto.bc -c -O2 ... $ clang -x ir objN.o -fthinlto-index=objN.o.thinlto.bc -c -O2 ``` It is during the thin link (lld --thinlto-index-only) that we have visibility into linker dependences and want to be able to pass the new option via -Wl,-supports-hot-cold-new. This will be recorded in the summary indexes created for the distributed backend processes (*.thinlto.bc) and queried from there, so that we don't need to know during those individual clang backends what allocation library was linked. Since in-process ThinLTO and regular LTO also use a combined index, for consistency we query the flag out of the index in all LTO backends. Additionally, when the LTO option is disabled, exit early from the MemProfContextDisambiguation handling performed during LTO, as this is unnecessary. Depends on D149117 and D149192. Differential Revision: https://reviews.llvm.org/D149215
2023-05-05[MemProf] Context disambiguation cloning pass [patch 4/4]Teresa Johnson5-26/+276
Applies ThinLTO cloning decisions made during the thin link and recorded in the summary index to the IR during the ThinLTO backend. Depends on D141077. Differential Revision: https://reviews.llvm.org/D149117
2023-05-05Restore again "[MemProf] Context disambiguation cloning pass [patch 3/4]"Teresa Johnson5-9/+148
This reverts commit f09807ca9dda2f588298d8733e89a81105c88120, restoring bfe7205975a63a605ff3faacd97fe4c1bf4c19b3 and follow on fix e3e6bc699574550f2ed1de07f4e5bcdddaa65557, now that the nondeterminism has been addressed by D149924. Differential Revision: https://reviews.llvm.org/D141077
2023-05-05[MemProf] Use MapVector to avoid non-determinismTeresa Johnson1-0/+223
Multiple cases of instability in the cloning behavior occurred due to iteration of maps indexed by pointers. Fix by changing the maps to MapVector. This necessitated adding DenseMapInfo specializations for the structure types used in the keys. These were found while trying to commit patch 3 of the cloning (bfe7205975a63a605ff3faacd97fe4c1bf4c19b3), but the second one turned out to be in code committed in patch 2, but just exposed by a new test added with patch 3. Specifically, the iteration in identifyClones(). Added the portion of the new test cases from patch 3 that only relied on the already committed changes and exposed the issue. Differential Revision: https://reviews.llvm.org/D149924
2023-05-04Revert "Restore "[MemProf] Context disambiguation cloning pass [patch 3/4]""Teresa Johnson5-369/+7
This reverts commit bfe7205975a63a605ff3faacd97fe4c1bf4c19b3, and follow on fix e3e6bc699574550f2ed1de07f4e5bcdddaa65557, due to some remaining instability exposed by the bot enabling expensive checks: https://lab.llvm.org/buildbot/#/builders/42/builds/9842
2023-05-04[MemProf] Need to require asserts for tests that use -statsTeresa Johnson5-0/+15
Follow up to bfe7205975a63a605ff3faacd97fe4c1bf4c19b3 to require asserts which is needed for the use of -stats. This showed up in the following bot failure: https://lab.llvm.org/buildbot/#/builders/91/builds/16760
2023-05-04Restore "[MemProf] Context disambiguation cloning pass [patch 3/4]"Teresa Johnson5-7/+366
This reverts commit 6fbf022908c104a380fd1854fb96eafc64509366, restoring commit bf6ff4fd4b735afffc65f92a4a79f6610e7174c3 with a fix for a bot failure due to a previously unstable iteration order. Differential Revision: https://reviews.llvm.org/D141077
2023-05-03Revert "[MemProf] Context disambiguation cloning pass [patch 3/4]"Teresa Johnson5-366/+7
This reverts commit bf6ff4fd4b735afffc65f92a4a79f6610e7174c3. There is a bot failure where we are getting the correct remarks output but in a different order. I'll need to investigate to see where we are having nondeterministic behavior.
2023-05-03[MemProf] Context disambiguation cloning pass [patch 3/4]Teresa Johnson5-7/+366
Applies cloning decisions to the IR, cloning functions and updating calls. For Regular LTO, the IR is updated directly during function assignment, whereas for ThinLTO it is recorded in the summary index (a subsequent patch will apply to the IR via the index during the ThinLTO backend. The function assignment and cloning proceeds greedily, and we create new clones as needed when we find an incompatible assignment of function clones to callsite clones (i.e. when different callers need to invoke different combinations of callsite clones). Depends on D140949. Differential Revision: https://reviews.llvm.org/D141077
2023-05-02[MemProf] Use profiled lifetime access density directlyTeresa Johnson6-6/+6
Now that the runtime tracks the lifetime access density directly, we can use that directly in the threshold checks instead of less accurately computing from other statistics. Differential Revision: https://reviews.llvm.org/D149684
2023-05-02[ThinLTO] Loosen up variable importing correctness checksTeresa Johnson1-0/+41
After importing variables, we do some checking to ensure that variables marked read or write only, which have been marked exported (e.g. because a referencing function has been exported), are on at least one module's imports list. This is because the read or write only variables will be internalized, so we need a copy any any module that references it. This checking is overly conservative in the case of linkonce_odr or other linkage types where there can already be a duplicate copy in existence in the importing module, which therefore wouldn't need to import it. Loosen up the checking for these linkage types. Fixes https://github.com/llvm/llvm-project/issues/62468. Differential Revision: https://reviews.llvm.org/D149630
2023-04-21Restore "[MemProf] Context disambiguation cloning pass [patch 2/3]"Teresa Johnson4-0/+433
This restores d0649a6ad8be778abf7569f502148d577f8bc6f1 (reverted in commit 03bf59d275a16815dc5a2e3f279815554f7cd0ca), with fixes for bot failures. Confirmed that gcc, which reproduced both failures, now builds it fine. Differential Revision: https://reviews.llvm.org/D140949
2023-04-21Revert "[MemProf] Context disambiguation cloning pass [patch 2/3]"Teresa Johnson4-433/+0
This reverts commit d0649a6ad8be778abf7569f502148d577f8bc6f1. Reverting due to a number of bot failures that need investigation.
2023-04-21[MemProf] Context disambiguation cloning pass [patch 2/3]Teresa Johnson4-0/+433
Performs cloning on the CallsiteContextGraph (not on the IR or summary index), in order to uniquely identify the allocation behavior of an allocation call given its context. In order to do this, the graph is recursively traversed starting from the allocation nodes, until we identify a point where the allocation behavior is unambiguous (the edges have a single allocation type). Nodes are then cloned as we unwind the recursion. We try to perform the minimal amount of cloning required to disambiguate the contexts. The follow-on patch will contain the support for applying the cloning to the IR. Depends on D140908 and D145836. Differential Revision: https://reviews.llvm.org/D140949
2023-04-20[ThinLTO] Remove BlockCount for non partial sample profile buildsTeresa Johnson1-2/+0
As pointed out in https://discourse.llvm.org/t/undeterministic-thin-index-file/69985, the block count added to distributed ThinLTO index files breaks incremental builds on ThinLTO - if any linked file has a different number of BBs, then the accumulated sum placed in the index files will change, causing all ThinLTO backend compiles to be redone. The block count is only used for scaling of partial sample profiles, and was added in D80403 for D79831. This patch simply removes this field from the index files of non partial sample profile compiles, which is NFC on the output of the compiler. We subsequently need to see if this can be removed for partial sample profiles without signficant performance loss, or redesigned in a way that does not destroy caching. Differential Revision: https://reviews.llvm.org/D148746
2023-04-18[PowerPC] Add function pointer alignment to DataLayoutDavid Tenty1-1/+1
The alignment of function pointers was added to the Datalayout by D57335 but currently is unset for the Power target. This will cause us to compute a conservative minimum alignment of one if places like Value::getPointerAlignment. This patch implements the function pointer alignment in the Datalayout for the Power backend and Power targets in clang, so we can query the value for a particular Power target. We come up with the correct value one of two ways: - If the target uses function descriptor objects (i.e. ELFv1 & AIX ABIs), then a function pointer points to the descriptor, so use the alignment we would emit the descriptor with. - If the target doesn't use function descriptor objects (i.e. ELFv2), a function pointer points to the global entry point, so use the minimum alignment for code on Power (i.e. 4-bytes). Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D147016
2023-03-25[ThinLTO] Only import for non-prevailing interposable global variablesShoaib Meenai2-0/+31
This logic was added in https://reviews.llvm.org/D95943 specifically to handle an issue for non-prevailing global variables. It turns out that it adds a new issue for prevailing glboal variables, since those could be replaced by an available_externally definition and hence incorrectly omitted from the output object file. Limit the import to non-prevailing global variables to fix this, as suggested by @tejohnson. The bulk of the diff is mechanical changes to thread isPrevailing through to where it's needed and ensure it's available before the relevant calls; the actual logic change itself is straightforward. Fixes https://github.com/llvm/llvm-project/issues/61677 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D146876