Age | Commit message (Collapse) | Author | Files | Lines |
|
If tail call optimization was not disabled for the profiled binary, the
call contexts will be missing frames for tail calls. Handle this by
performing a limited search through tail call edges for the profiled
callee when a discontinuity is detected. The search depth is adjustable
but defaults to 5.
If we are able to identify a short sequence of tail calls, update the
graph for those calls. In the case of ThinLTO, synthesize the necessary
CallsiteInfos for carrying the cloning information to the backends.
|
|
This patch deduces `noundef` attributes for return values.
IIUC, a function returns `noundef` values iff all of its return values
are guaranteed not to be `undef` or `poison`.
Definition of `noundef` from LangRef:
```
noundef
This attribute applies to parameters and return values. If the value representation contains any
undefined or poison bits, the behavior is undefined. Note that this does not refer to padding
introduced by the type’s storage representation.
```
Alive2: https://alive2.llvm.org/ce/z/g8Eis6
Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.01%|+0.01%|-0.01%|+0.01%|+0.03%|-0.04%|+0.01%|
The motivation of this patch is to reduce the number of `freeze` insts
and enable more optimizations.
|
|
GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75954)
Simplify the compiler-rt test to make it more general for different
platforms, and use `*DAG` matchers for lines that may be emitted
out-of-order.
- The compiler-rt test passed on a Windows machine. Previously name
matchers don't work for MSVC mangling
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907)
- `*DAG` matchers fixed the error in
https://lab.llvm.org/buildbot/#/builders/94/builds/17924
This is the second reland and fixed errors caught in first reland
(https://github.com/llvm/llvm-project/pull/75860)
**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/compiler-rt/include/profile/InstrProfData.inc#L72)
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/ProfileData/InstrProf.cpp#L876-L885)
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L1707)
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
|
|
use semicolon as delimiter for local-linkage varibles. "" (#75888)
Reverts llvm/llvm-project#75860
- Mangled name mismatch on Windows
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907/steps/8/logs/stdio)
|
|
semicolon as delimiter for local-linkage varibles. " (#75860)
Fixed build-bot failures caught by post-submit tests
1) Add the list of command line tools needed by new compiler-rt test into dependency.
2) Use `starts_with` to replace deprecated `startswith`.
**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/compiler-rt/include/profile/InstrProfData.inc#L72)
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/ProfileData/InstrProf.cpp#L876-L885)
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L1707)
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
|
|
semicolon as delimiter for local-linkage varibles." (#75835)
Reverts llvm/llvm-project#74008
The compiler-rt test failed due to `llvm-dis` not found
(https://lab.llvm.org/buildbot/#/builders/127/builds/59884)
Will revert and investigate how to require the proper dependency.
|
|
as delimiter for local-linkage varibles. (#74008)
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/compiler-rt/include/profile/InstrProfData.inc#L72)
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/ProfileData/InstrProf.cpp#L876-L885)
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L1707)
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
|
|
An example of a "workload definition" would be "the transitive closure of functions actually called to satisfy a RPC request", i.e. a (typically significantly) smaller subset of the transitive closure (static + possible indirect call targets) of callees. This means this workload definition is a type of flat dynamic profile.
Producing one is not in scope - it can be produced offline from traces, or from sample-based profiles, etc.
This patch adds awareness to ThinLTO of such a concept. A workload is defined as a root and a list of functions. All function references are by-name (more readable than GUIDs). In the case of aliases, the expectation is the list contains all the alternative names.
The workload definitions are presented to the linker as a json file, containing a dictionary. The keys are the roots, the values are the list of functions.
The import list for a module defining a root will be the functions listed for it in the profile.
Using names this way assumes unique names for internal functions, i.e. clang's `-funique-internal-linkage-names`.
Note that the behavior affects the entire module where a root is defined (i.e. different workloads best be defined in different modules), and does not affect modules that don't define roots.
|
|
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of missing
frames from tail calls in profiled call stacks for MemProf of profiled
binaries that did not disable tail call elimination. A follow on change
will add the use of this new flag during MemProf context disambiguation.
The new flag is encoded in the bitcode along with either the hotness
flag from the profile, or the relative block frequency under the
-write-relbf-to-summary flag when there is no profile data.
Because we now will always have some additional call edge information, I
have removed the non-profile function summary record format, and we
simply encode the tail call flag along with a hotness type of none when
there is no profile information or relative block frequency. The change
of record format and name caused most of the test case changes.
I have added explicit testing of generation of the new tail call flag
into the bitcode and IR assembly format as part of the changes to
llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also
added round trip testing through assembly and bitcode to
llvm/test/Assembler/thinlto-summary.ll.
|
|
Removes some unnecessary testing of dump and dot file formats, to
simplify updates to this test.
|
|
Dead store elimination pass may fold malloc + memset calls into a single
call to calloc. If calloc is not preserved and is not being called
it can be marked dead which results in link error.
|
|
Mirror the handling in ModuleSummaryAnalysis to look through aliases
when handling call instructions in the ThinLTO backend MemProf handling.
Fixes #72094
|
|
Require that constants are ImmConstant for this transform, as we
may otherwise generate constant expressions, which are not
necessarily free.
|
|
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:
* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
|
|
The module paths string table mapped to both an id sequentially assigned
during LTO linking, and the module hash. The former is leftover from
before the module hash was added for caching and subsequently replaced
use of the module id when renaming promoted symbols (to avoid affects
due to link order changes). The sequentially assigned module id was not
removed, however, as it was still a convenience when serializing to/from
bitcode and assembly.
This patch removes the module id from this table, since it isn't
strictly needed and can lead to confusion on when it is appropriate to
use (e.g. see fix in D156525). It also takes a (likely not significant)
amount of overhead. Where an integer module id is needed (e.g. bitcode
writing), one is assigned on the fly.
There are a couple of test changes since the paths are now sorted
alphanumerically when assigning ids on the fly during assembly writing,
in order to ensure deterministic behavior.
Differential Revision: https://reviews.llvm.org/D156730
|
|
|
|
Fix https://github.com/llvm/llvm-project/issues/58740
The `target_clones` attribute results in ifunc on eligible targets
(Linux glibc/Android or FreeBSD). If the function has internal linkage,
we will get an internal linkage ifunc.
```
__attribute__((target_clones("popcnt", "default")))
static int foo(int n) { return __builtin_popcount(n); }
int use(int n) { return foo(n); }
@foo.ifunc = internal ifunc i32 (i32), ptr @foo.resolver
define internal nonnull ptr @foo.resolver() comdat {
; local linkage comdat is another issue that should be fixed
...
select i1 %.not, ptr @foo.default.1, ptr @foo.popcnt.0
...
}
define internal i32 @foo.default.1(i32 noundef %n)
```
ifuncs are not included in module summaries, so LTO doesn't know the
local linkage `foo.default.1` referenced by `foo.resolver`
should be promoted. If a caller of `foo` (e.g. `use`) is imported,
the local linkage `foo.resolver` will be cloned as a definition
(IRLinker::shouldLink), leading to linker errors.
```
ld.lld: error: undefined hidden symbol: foo.default.1.llvm.8017227050314953235
>>> referenced by bar.c
>>> lto.tmp:(foo.ifunc)
```
As a simple fix, just mark `use` as not eligible for import. Non-local
linkage ifuncs do not have the problem, because they are not imported,
and not cloned when a caller is imported.
---
https://reviews.llvm.org/D82745 contains a more involved fix, though the
original bug it intended to fix
(https://github.com/llvm/llvm-project/issues/45833) now works.
Note: importing ifunc is tricky.
If we import an ifunc, we need to make sure the resolver and the
implementation are in the translation unit, as required by
https://sourceware.org/glibc/wiki/GNU_IFUNC
> Requirement (a): Resolver must be defined in the same translation unit as the implementations.
This is infeasible if the implementation is changed to
available_externally.
In addition, the imported ifunc may be referenced by two translation
units. This doesn't work with PowerPC32 -msecure-plt
(https://maskray.me/blog/2021-01-18-gnu-indirect-function).
At the very least, every referencing translation unit needs one extra
IRELATIVE dynamic relocation.
At least for the local linkage ifunc case, it doesn't have much use
outside of `target_clones`, as a global pointer is usually a better
replacement.
I think ifuncs just have too many pitfalls to design more IR features
around it to optimize them.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D158961
|
|
internalized
To improve the tests added by D151965. Discovered when writing D158949.
|
|
This is a followup to D151165. Instead of using the module ID, use
the module hash for sorting the import list. The module hash is what
will actually be included in the hash.
This has the advantage of being independent of the module order,
which is something that Rust relies on.
A caveat here is that the test doesn't quite work for linkonce_odr
functions, because the function may be imported from two different
modules, and the first one on the llvm-lto2 command line gets picked
(rather than, say, the prevailing copy). This doesn't really matter
for Rust's purposes (because it does not use linkonce_odr linkage),
but may still be worth addressing. For now I'm using a variant of
the test using internal instead of linkonce_odr functions.
Differential Revision: https://reviews.llvm.org/D156525
|
|
imported entities (3/7)"
Got rid of non-determinism in MetadataLoader.cpp.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
|
|
function-local imported entities (3/7)""
This reverts commit fcc3981626821addc6c77b98006d02030b8ceb7f,
since Bitcode-upgrading code doesn't seem to be deterministic.
|
|
imported entities (3/7)"
Run split-dwarf-local-impor3.ll only on x86_64-linux.
|
|
imported entities (3/7)"
This reverts commit d80fdc6fc1a6e717af1bcd7a7313e65de433ba85.
split-dwarf-local-impor3.ll fails because of an issue with
Dwo sections emission on Windows platform.
|
|
entities (3/7)
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
|
|
imported entities (3/7)"
This reverts commit ed578f02cf44a52adde16647150e7421f3ef70f3.
Tests llvm/test/DebugInfo/Generic/split-dwarf-local-import*.ll fail
when x86_64 target is not registered.
|
|
entities (3/7)
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
|
|
imported entities (3/7)"
This reverts commit d04452d54829cd7af5b43d670325ffa755ab0030 since
test llvm-project/llvm/test/Bitcode/DIImportedEntity_backward.ll is broken.
|
|
entities (3/7)
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
|
|
This fixes a runtime error that occurred due to incorrect
internalization of linkonce_odr functions where function pointer
equality was broken. This was hit because the prevailing copy was in a
native object, so the IR copies were not exported, and the existing code
internalized all of the IR copies. It could be fixed by guarding this
internalization on whether the defs are (local_)unnamed_addr, meaning
that their address is not significant (which we have in the summary
currently for linkonce_odr via the CanAutoHide flag). Or we can
propagate reference attributes as we do when determining whether a
global variable is read or write-only (reference edges are annotated
with whether they are read-only, write-only, or neither, and taking the
address of a function would result in a reference edge to the function
that is not read or write-only).
However, this exposed a larger issue with the internalization handling.
Looking at test cases, it appears the intent is to internalize when
there is a single definition of a linkonce/weak ODR symbol (that isn't
exported). This makes sense in the case of functions, because the
inliner can apply its last call to static heuristic when appropriate. In
the case where there is no prevailing copy in IR, internalizing all of
the IR copies of a linkonce_odr, even if legal, just increases binary
size. In that case it is better to fall back to the normal handling of
converting all non-prevailing copies to available_externally so that
they are eliminated after inlining.
In the case of variables, the existing code was attempting to
internalize the non-exported linkonce/weak ODR variables if they were
read or write-only. While this is legal (we propagate reference
attributes to determine this information), we don't even need to
internalize these here as there is later separate handling that
internalizes read and write-only variables when we process the module at
the start of the ThinLTO backend (processGlobalForThinLTO). Instead, we
can also internalize any non-exported variable when there is only one
(IR) definition, which is prevailing. And in that case, we don't need to
require that it is read or write-only, since we are guaranteed that all
uses must use that single definition.
In the new LTO API, if there are multiple defs of a linkonce or weak ODR
it will be marked exported, but it isn't clear that this will always be
true for the legacy LTO API. Therefore, require that there is only a
single (non-local) def, and that it is prevailing.
The test cases changes are both to reflect the change in the handling of
linkonce_odr IR copies where the prevailing def is not in IR (the main
correctness bug fix here), and to reflect the more aggressive
internalization of variables when there is only a single def, it is in
IR, and not exported.
I've also added some additional testing via the new LTO API.
Differential Revision: https://reviews.llvm.org/D151965
|
|
Otherwise there are cache misses just from changing the name of a path, even though the input modules did not change.
rdar://109672225
Differential Revision: https://reviews.llvm.org/D151165
|
|
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
|
|
I noticed that we are converting llvm.public.type.test to regular
llvm.type.test too early, and thus not updating those in imported
functions. This would result in losing out on WPD opportunities. Move
the update to after function importing, and improve test to cover this
case.
Differential Revision: https://reviews.llvm.org/D150326
|
|
The support added by D149215 to remove memprof metadata and attributes
if we don't link with an allocator supporting hot/cold operator new
interfaces did not update imported code. Move the update handling later
in the ThinLTO backend to just after importing, and update the test to
check this case.
Differential Revision: https://reviews.llvm.org/D150295
|
|
Adds an LTO option to indicate that whether we are linking with an
allocator that supports hot/cold operator new interfaces. If not,
at the start of the LTO backends any existing memprof hot/cold
attributes are removed from the IR, and we also remove memprof metadata
so that post-LTO inlining doesn't add any new attributes.
This is done via setting a new flag in the module summary index. It is
important to communicate via the index to the LTO backends so that
distributed ThinLTO handles this correctly, as they are invoked by
separate clang processes and the combined index is how we communicate
information from the LTO link. Specifically, for distributed ThinLTO the
LTO related processes look like:
```
# Thin link:
$ lld --thinlto-index-only obj1.o ... objN.o -llib ...
# ThinLTO backends:
$ clang -x ir obj1.o -fthinlto-index=obj1.o.thinlto.bc -c -O2
...
$ clang -x ir objN.o -fthinlto-index=objN.o.thinlto.bc -c -O2
```
It is during the thin link (lld --thinlto-index-only) that we have
visibility into linker dependences and want to be able to pass the new
option via -Wl,-supports-hot-cold-new. This will be recorded in the
summary indexes created for the distributed backend processes
(*.thinlto.bc) and queried from there, so that we don't need to know
during those individual clang backends what allocation library was
linked. Since in-process ThinLTO and regular LTO also use a combined
index, for consistency we query the flag out of the index in all LTO
backends.
Additionally, when the LTO option is disabled, exit early from the
MemProfContextDisambiguation handling performed during LTO, as this is
unnecessary.
Depends on D149117 and D149192.
Differential Revision: https://reviews.llvm.org/D149215
|
|
Applies ThinLTO cloning decisions made during the thin link and
recorded in the summary index to the IR during the ThinLTO backend.
Depends on D141077.
Differential Revision: https://reviews.llvm.org/D149117
|
|
This reverts commit f09807ca9dda2f588298d8733e89a81105c88120, restoring
bfe7205975a63a605ff3faacd97fe4c1bf4c19b3 and follow on fix
e3e6bc699574550f2ed1de07f4e5bcdddaa65557, now that the nondeterminism
has been addressed by D149924.
Differential Revision: https://reviews.llvm.org/D141077
|
|
Multiple cases of instability in the cloning behavior occurred due to
iteration of maps indexed by pointers. Fix by changing the maps to
MapVector. This necessitated adding DenseMapInfo specializations for the
structure types used in the keys.
These were found while trying to commit patch 3 of the cloning
(bfe7205975a63a605ff3faacd97fe4c1bf4c19b3), but the second one turned
out to be in code committed in patch 2, but just exposed by a new test
added with patch 3. Specifically, the iteration in identifyClones().
Added the portion of the new test cases from patch 3 that only relied on
the already committed changes and exposed the issue.
Differential Revision: https://reviews.llvm.org/D149924
|
|
This reverts commit bfe7205975a63a605ff3faacd97fe4c1bf4c19b3, and follow
on fix e3e6bc699574550f2ed1de07f4e5bcdddaa65557, due to some remaining
instability exposed by the bot enabling expensive checks:
https://lab.llvm.org/buildbot/#/builders/42/builds/9842
|
|
Follow up to bfe7205975a63a605ff3faacd97fe4c1bf4c19b3 to require asserts
which is needed for the use of -stats. This showed up in the following
bot failure: https://lab.llvm.org/buildbot/#/builders/91/builds/16760
|
|
This reverts commit 6fbf022908c104a380fd1854fb96eafc64509366, restoring
commit bf6ff4fd4b735afffc65f92a4a79f6610e7174c3 with a fix for a bot
failure due to a previously unstable iteration order.
Differential Revision: https://reviews.llvm.org/D141077
|
|
This reverts commit bf6ff4fd4b735afffc65f92a4a79f6610e7174c3.
There is a bot failure where we are getting the correct remarks output
but in a different order. I'll need to investigate to see where we are
having nondeterministic behavior.
|
|
Applies cloning decisions to the IR, cloning functions and updating
calls. For Regular LTO, the IR is updated directly during function
assignment, whereas for ThinLTO it is recorded in the summary index
(a subsequent patch will apply to the IR via the index during the
ThinLTO backend.
The function assignment and cloning proceeds greedily, and we create new
clones as needed when we find an incompatible assignment of function
clones to callsite clones (i.e. when different callers need to invoke
different combinations of callsite clones).
Depends on D140949.
Differential Revision: https://reviews.llvm.org/D141077
|
|
Now that the runtime tracks the lifetime access density directly, we can
use that directly in the threshold checks instead of less accurately
computing from other statistics.
Differential Revision: https://reviews.llvm.org/D149684
|
|
After importing variables, we do some checking to ensure that variables
marked read or write only, which have been marked exported (e.g.
because a referencing function has been exported), are on at least one
module's imports list. This is because the read or write only variables
will be internalized, so we need a copy any any module that references
it.
This checking is overly conservative in the case of linkonce_odr or
other linkage types where there can already be a duplicate copy in
existence in the importing module, which therefore wouldn't need to
import it. Loosen up the checking for these linkage types.
Fixes https://github.com/llvm/llvm-project/issues/62468.
Differential Revision: https://reviews.llvm.org/D149630
|
|
This restores d0649a6ad8be778abf7569f502148d577f8bc6f1 (reverted in
commit 03bf59d275a16815dc5a2e3f279815554f7cd0ca), with fixes for bot
failures. Confirmed that gcc, which reproduced both failures, now
builds it fine.
Differential Revision: https://reviews.llvm.org/D140949
|
|
This reverts commit d0649a6ad8be778abf7569f502148d577f8bc6f1.
Reverting due to a number of bot failures that need investigation.
|
|
Performs cloning on the CallsiteContextGraph (not on the IR or summary
index), in order to uniquely identify the allocation behavior of an
allocation call given its context. In order to do this, the graph is
recursively traversed starting from the allocation nodes, until we
identify a point where the allocation behavior is unambiguous (the edges
have a single allocation type). Nodes are then cloned as we unwind the
recursion. We try to perform the minimal amount of cloning required to
disambiguate the contexts.
The follow-on patch will contain the support for applying the cloning to
the IR.
Depends on D140908 and D145836.
Differential Revision: https://reviews.llvm.org/D140949
|
|
As pointed out in
https://discourse.llvm.org/t/undeterministic-thin-index-file/69985, the
block count added to distributed ThinLTO index files breaks incremental
builds on ThinLTO - if any linked file has a different number of BBs,
then the accumulated sum placed in the index files will change, causing
all ThinLTO backend compiles to be redone.
The block count is only used for scaling of partial sample profiles, and
was added in D80403 for D79831.
This patch simply removes this field from the index files of non partial
sample profile compiles, which is NFC on the output of the compiler.
We subsequently need to see if this can be removed for partial sample
profiles without signficant performance loss, or redesigned in a way
that does not destroy caching.
Differential Revision: https://reviews.llvm.org/D148746
|
|
The alignment of function pointers was added to the Datalayout by
D57335 but currently is unset for the Power target. This will cause us
to compute a conservative minimum alignment of one if places like
Value::getPointerAlignment.
This patch implements the function pointer alignment in the Datalayout
for the Power backend and Power targets in clang, so we can query the
value for a particular Power target.
We come up with the correct value one of two ways:
- If the target uses function descriptor objects (i.e. ELFv1 & AIX ABIs),
then a function pointer points to the descriptor, so use the alignment
we would emit the descriptor with.
- If the target doesn't use function descriptor objects (i.e. ELFv2), a
function pointer points to the global entry point, so use the minimum
alignment for code on Power (i.e. 4-bytes).
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D147016
|
|
This logic was added in https://reviews.llvm.org/D95943 specifically to
handle an issue for non-prevailing global variables. It turns out that
it adds a new issue for prevailing glboal variables, since those could
be replaced by an available_externally definition and hence incorrectly
omitted from the output object file. Limit the import to non-prevailing
global variables to fix this, as suggested by @tejohnson.
The bulk of the diff is mechanical changes to thread isPrevailing
through to where it's needed and ensure it's available before the
relevant calls; the actual logic change itself is straightforward.
Fixes https://github.com/llvm/llvm-project/issues/61677
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D146876
|