Age | Commit message (Collapse) | Author | Files | Lines |
|
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/ProfileData`
library. These annotations currently have no meaningful impact on the
LLVM build; however, they are a prerequisite to support an LLVM Windows
DLL (shared library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The following manual adjustments were also applied after running IDS on
Linux:
- Manually annotate the file
`llvm/include/llvm/ProfileData/InstrProfData.inc` because it is skipped
by IDS
- Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported
instantiated templates.
- Add `LLVM_ABI_FRIEND` to friend member functions declared with
`LLVM_ABI`
- Add `LLVM_ABI` to a small number of symbols that require export but
are not declared in headers
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
|
|
Address post-commit review comments from PR141805. Misc cleanup but the
biggest changes are moving some common utilities to new MemProfCommon
files to reduce unnecessary includes.
|
|
This patch adds support for a basic MemProf summary section, which is
built along with the indexed MemProf profile (e.g. when reading the raw
or YAML profiles), and serialized through the indexed profile just after
the header.
Currently only 6 fields are written, specifically the number of contexts
(total, cold, hot), and the max context size (cold, warm, hot).
To support forwards and backwards compatibility for added fields in the
indexed profile, the number of fields serialized first. The code is
written to support forwards compatibility (reading newer profiles with
additional summary fields), and comments indicate how to implement
backwards compatibility (reading older profiles with fewer summary
fields) as needed.
Support is added to print the summary as YAML comments when displaying
both the raw and indexed profiles via `llvm-profdata show`. Because they
are YAML comments, the YAML reader ignores these (the summary is always
recomputed when building the indexed profile as described above).
This necessitated moving some options and a couple of interfaces out of
Analysis/MemoryProfileInfo.cpp and into the new
ProfileData/MemProfSummary.cpp file, as we need to classify context
hotness earlier and also compute context ids to build the summary from
older indexed profiles.
|
|
Part of a larger refactoring with the following goals
1. Reduce the size of MemProf.h
2. Avoid including ModuleSummaryIndex just for a couple of types
|
|
Part of a larger refactoring with the following goals
1. Reduce the size of MemProf.h
2. Avoid including ModuleSummaryIndex just for a couple of types
|
|
Part of a larger refactoring with the following goals
1. Reduce the size of MemProf.h
2. Avoid including ModuleSummaryIndex just for a couple of types
|
|
|
|
This patch adds CalleeGuids to the serialized format and increments the version number to 4. The unit tests are updated to include a new test for v4 and the YAML format is also updated to be able to roundtrip the v4 format.
|
|
* Added YAML traits for `CallSiteInfo`
* Updated the `MemProfReader` to pass `Frames` instead of the entire
`CallSiteInfo`
* Updated test cases to use `testing::Field`
* Add YAML sequence traits for CallSiteInfo in MemProfYAML
* Also extend IndexedMemProfRecord
* XFAIL the MemProfYaml round trip test until we update the profile
format
For now we only read and write the additional information from the YAML
format. The YAML round trip test will be enabled when the serialized format is updated.
|
|
This patch introduces IndexedCallstackIdConveter as a convenience
wrapper around FrameIdConverter and CallStackIdConverter just for
tests.
With the new wrapper, we get to replace idioms like:
FrameIdConverter<decltype(MemProfData.Frames)> FrameIdConv(
MemProfData.Frames);
CallStackIdConverter<decltype(MemProfData.CallStacks)> CSIdConv(
MemProfData.CallStacks, FrameIdConv);
with:
IndexedCallstackIdConveter CSIdConv(MemProfData);
Unfortunately, this exact pattern occurs in tests only; the
combinations of the frame ID converter and call stack ID converter are
diverse in production code.
|
|
This patch checks the result of YAML parsing at the level of
MemProfRecord instead of IndexedMemProfRecord, thereby avoiding use of
Frame::hash and hashCallStacks. This makes sense because we
ultimately care about consumers like MemProfiler.cpp obtaining
MemProfRecord correctly; IndexedMemProfData and hash values are just
intermediaries.
Once this patch lands, we call Frame::hash and hashCallStacks only
when adding Frames or call stacks to their respective data structures.
In other words, the hash functions are pretty much business internal
to IndexedMemProfRecord.
|
|
In these tests, we just want to add one instance of
IndexedMemProfRecord to MemProfData.Records and retrieve it from
MemProfReader. There is no particular reason to associate F1.hash()
with the IndexedMemProfRecord instance. A fake value suffices.
While I am at it, I'm switching to try_emplace so that I can move
FakeRecord.
|
|
Migrating away from Frame::hash and hashCallStack further encapsulates
how the IDs are calculated.
Note that unit tests are the only places where Frame::hash and
hashCallStack are used. The code proper (i.e. llvm/lib) uses
IndexedMemProfData::{addFrame,addCallStack}; they do not directly use
Frame::hash or hashCallStack.
|
|
Here IndexedMemProfRecord just needs to reference a CallStackID, so we
can use addCallStack for a real hash-based CallStackId instead of a
fake value like 0x222.
|
|
This patch uses IndexedMemProfData in unit tests even when we only
need CallStacks. This way, we get to use addCallStack. Also, the
look is more consistent with other unit tests, where we typically do:
IndexMemProfData MemProfData;
MemProfData.addFrame(...);
MemProfData.addCallStack(...);
// Run some tests
|
|
Note that we already have:
using ::testing::IsEmpty;
|
|
This patch does two things:
- During deserialization, we accept a function name for Frame as an
alternative to the usual GUID expressed as a hexadecimal number.
- During serialization, we print a GUID of Frame as a 16-digit
hexadecimal number prefixed with 0x in the usual way. (Without this
patch, we print a decimal number, which is not customary.)
The patch uses a machinery called "normalization" in YAML I/O, which
lets us serialize and deserialize into an alternative data structure.
For our use case, we have an alternative Frame data structure, which
is identical to "struct Frame" except that Function is of type
GUIDHex64 instead of GlobalValue::GUID. This alternative type
supports the two bullet points above without modifying "struct Frame"
at all.
|
|
The YAML support is increasing in size, so this patch moves it to a
separate file.
|
|
|
|
"front" allows us to drop a dereference.
|
|
This patch does two things:
- During deserialization, we accept a function name as an alternative
to the usual GUID represented as a hexadecimal number.
- During serialization, we print a GUID as a 16-digit hexadecimal
number prefixed with 0x in the usual way. (Without this patch, we
print a decimal number, which is not customary.)
In YAML, the MemProf profile is a vector of pairs of GUID and
MemProfRecord. This patch accepts a function name for the GUID, but
it does not accept a function name for the GUID used in Frames yet.
That will be addressed in a subsequent patch.
|
|
|
|
MemProfTest.cpp is about MemProf, so mentioning llvm::memprof
everywhere is quite verbose.
|
|
This patch replaces memprof::Foo with Foo if we have corresponding:
using llvm::memprof::Foo;
|
|
When we call IndexedMemProfRecord::toMemProfRecord, we care about
getting the original (that is, non-indexed) MemProfRecord back, so we
should just verify that, not the hash values, which are
intermediaries.
There is a remote possibility of hash collisions where call stack
{F1, F2} might come back as {F1, F1} if F1.hash() == F2.hash() for
example. However, since FrameId uses BLAKE, the hash values should be
consistent across architectures. That is, if this test case works on
one architecture, it should work on others as well.
|
|
IndexedMemProfData eliminates the need for the "using" directives.
Also, we do not need to declare maps for individual components of the
MemProf profile.
|
|
These gtest matchers reduce the number of times we mention the
variables under examined.
|
|
This patch replaces FrameIdMap and CallStackIdMap with
IndexedMemProfData, which comes with recently introduced methods like
addFrame and addCallStack.
|
|
This patch adds a helper function to replace an idiom like:
CallStackId CSId = hashCallStack(CallStack)
MemProfData.CallStacks.try_emplace(CSId, CallStack);
// Do something with CSId.
|
|
This patch makes the YAML field name match the struct field name.
|
|
For Frames, we prefer the inline notation for the brevity.
For PortableMemInfoBlock, we go through all member fields and print
out those that are populated.
This iteration works around the unavailability of
ScalarTraits<uintptr_t> on macOS.
|
|
This reverts commit 7b8cf147addf7d3fb4630475c40153226f5fdbd0.
Breaks building on macOS
https://lab.llvm.org/buildbot/#/builders/190/builds/10737
https://lab.llvm.org/buildbot/#/builders/23/builds/5491
https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-cmake-RA-incremental/6076/
|
|
This patch adds a helper function to replace an idiom like:
FrameId Id = F.hash();
MemProfData.Frames.try_emplace(Id, F);
// Do something with Id.
|
|
For Frames, we prefer the inline notation for the brevity.
For PortableMemInfoBlock, we go through all member fields and print
out those that are populated.
|
|
This tests uses existing "using" directives to shorten unit tests.
- llvm::memprof::hashCallStack -> hashCallStack
- testing::Pair -> Pair
- testing::ElementsAreArray -> ElementsAre
- testing::Contains -> UnorderedElementsAre
|
|
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes because we do not
have a text format for the MemProf profile. We would write a test
case in C++, run it for a binary MemProf profile, and then finally run
a test written in LLVM IR with the binary profile.
This patch paves the way toward YAML-based MemProf profile.
Specifically, it adds new class YAMLMemProfReader derived from
MemProfReader. For now, it only adds a function to parse StringRef
pointing to YAML data. Subseqeunt patches will wire it to
llvm-profdata and read from a file.
The field names are based on various printYAML functions in MemProf.h.
I'm not aiming for compatibility with the format used in printYAML,
but I don't see a point in changing the field names.
This iteration works around the unavailability of
ScalarTraits<uintptr_t> on macOS.
|
|
This reverts commit c00e53208db638c35499fc80b555f8e14baa35f0.
It looks like this breaks building LLVM on macOS and some other
platform/compiler combos
https://lab.llvm.org/buildbot/#/builders/23/builds/5252
https://green.lab.llvm.org/job/llvm.org/job/clang-san-iossim/5356/console
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/lib/ProfileData/MemProfReader.cpp:34:
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/MemProfReader.h:24:
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/InstrProfReader.h:22:
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/InstrProfCorrelator.h:21:
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:1173:36: error: implicit instantiation of undefined template 'llvm::yaml::MissingTrait<unsigned long>'
char missing_yaml_trait_for_type[sizeof(MissingTrait<T>)];
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:961:7: note: in instantiation of function template specialization 'llvm::yaml::yamlize<unsigned long>' requested here
yamlize(*this, Val, Required, Ctx);
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:883:11: note: in instantiation of function template specialization 'llvm::yaml::IO::processKey<unsigned long, llvm::yaml::EmptyContext>' requested here
this->processKey(Key, Val, true, Ctx);
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/MIBEntryDef.inc:55:1: note: in instantiation of function template specialization 'llvm::yaml::IO::mapRequired<unsigned long>' requested here
MIBEntryDef(AccessHistogram = 27, AccessHistogram, uintptr_t)
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/lib/ProfileData/MemProfReader.cpp:77:8: note: expanded from macro 'MIBEntryDef'
Io.mapRequired(KeyStr.str().c_str(), MIB.Name); \
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:310:8: note: template is declared here
struct MissingTrait;
^
1 error generated.
|
|
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes because we do not
have a text format for the MemProf profile. We would write a test
case in C++, run it for a binary MemProf profile, and then finally run
a test written in LLVM IR with the binary profile.
This patch paves the way toward YAML-based MemProf profile.
Specifically, it adds new class YAMLMemProfReader derived from
MemProfReader. For now, it only adds a function to parse StringRef
pointing to YAML data. Subseqeunt patches will wire it to
llvm-profdata and read from a file.
The field names are based on various printYAML functions in MemProf.h.
I'm not aiming for compatibility with the format used in printYAML,
but I don't see a point in changing the field names.
|
|
IndexedMemProfRecord contains a complete package of the MemProf
profile, including frames, call stacks, and records. This patch
replaces the three member variables of MemProfReader with
IndexedMemProfRecord.
This transition significantly simplies both the constructor and the
final "take" method:
MemProfReader(IndexedMemProfData MemProfData)
: MemProfData(std::move(MemProfData)) {}
IndexedMemProfData takeMemProfData() { return std::move(MemProfData); }
|
|
CallStackRadixTreeBuilder::build takes the parameter
MemProfFrameIndexes by value, involving copies:
std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>>
MemProfFrameIndexes
Then "build" makes another copy of MemProfFrameIndexe and passes it to
encodeCallStack for every call stack, which is painfully slow.
This patch changes the type to a pointer so that we don't have to make
a copy every time we pass the argument.
Without this patch, it takes 553 seconds to run "llvm-profdata merge"
on a large MemProf raw profile. This patch shortenes that down to 67
seconds.
|
|
This patch removes MemProf format Version 1 now that Version 2 and 3
are working well.
|
|
This patch updates a unit test to construct MemProfReader with
IndexedMemProfData, a complete package of MemProf profile.
With this change, nobody in the LLVM codebase is using the
MemProfReader constructor that takes individual components of the
MemProf profile, so this patch deprecates the constructor.
|
|
Prepare for usage in the bitcode reader/writer where we already have a
LinearFrameId:
- templatize input frame id type in CallStackRadixTreeBuilder
- templatize input frame id type in computeFrameHistogram
- make the map from FrameId to LinearFrameId optional
We plan to use the same radix format in the ThinLTO summary records,
where we already have a LinearFrameId.
|
|
This patch adds a new constructor to MemProfReader that takes
IndexedMemProfData, a complete package of MemProf profile. To
showcase its usage, I'm updating one of the unit tests to use the new
constructor.
Because of type mismatches between DenseMap and MapVector, I'm copying
Frames and CallStacks for now. Once we remove the methods and old
constructors that take or return individual components (frames, call
stacks, and records), we will drop the copying, and the new
constructor will collapse down to:
MemProfReader(IndexedMemProfData MemProfData)
: MemProfData(std::move(MemProfData)) {}
Since nobody in the LLVM codebase uses the constructor that takes the
three indivdual components, I'm deprecating the old constructor.
|
|
This patch adds another constructor to IndexedAllocationInfo that is
identical to the existing constructor except that the new one leaves
the CallStack field empty.
I'm planning to remove MemProf format Version 1. Then we will migrate
the users of the existing constructor to the new one as nobody will be
using the CallStack field anymore.
Adding the new constructor now allows us to migrate a few existing
users of the old constructor even before we remove the CallStack
field. In turn, that simplifies the patch to actually remove the
field.
|
|
This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.
I'm not touching version 1 for now because some tests still rely on
version 1.
Note that Version 0 is identical to Version 1 except that the MemProf
section of the indexed format has a MemProf version field.
|
|
raw_string_ostream::flush() is essentially a no-op (also specified in docs).
Don't call it in tests that aren't meant to test 'raw_string_ostream' itself.
p.s. remove a few redundant calls to raw_string_ostream::str()
|
|
Adds compile time flag -mllvm -memprof-histogram and runtime flag
histogram=true|false to turn Histogram collection on and off. The
-memprof-histogram flag relies on -memprof-use-callbacks=true to work.
Updates shadow mapping logic in histogram mode from having one 8 byte
counter for 64 bytes, to 1 byte for 8 bytes, capped at 255. Only
supports this granularity as of now.
Updates the RawMemprofReader and serializing MemoryInfoBlocks to binary
format, including changing to a new version of the raw binary format
from version 3 to version 4.
Updates creating MemoryInfoBlocks with and without Histograms. When two
MemoryInfoBlocks are merged, AccessCounts are summed up and the shorter
Histogram is removed.
Adds a memprof_histogram test case.
Initial commit for adding AccessCountHistograms up until RawProfile for
memprof
|
|
We call llvm::sort in a couple of places in the V3 encoding:
- We sort Frames by FrameIds for stability of the output.
- We sort call stacks in the dictionary order to maximize the length
of the common prefix between adjacent call stacks.
It turns out that we can improve the deserialization performance by
modifying the comparison functions -- without changing the format at
all. Both places take advantage of the histogram of Frames -- how
many times each Frame occurs in the call stacks.
- Frames: We serialize popular Frames in the descending order of
popularity for improved cache locality. For two equally popular
Frames, we break a tie by serializing one that tends to appear
earlier in call stacks. Here, "earlier" means a smaller index
within llvm::SmallVector<FrameId>.
- Call Stacks: We sort the call stacks to reduce the number of times
we follow pointers to parents during deserialization. Specifically,
instead of comparing two call stacks in the strcmp style -- integer
comparisons of FrameIds, we compare two FrameIds F1 and F2 with
Histogram[F1] < Histogram[F2] at respective indexes. Since we
encode from the end of the sorted list of call stacks, we tend to
encode popular call stacks first.
Since the two places use the same histogram, we compute it once and
share it in the two places.
Sorting the call stacks reduces the number of "jumps" by 74% when we
deserialize all MemProfRecords. The cycle and instruction counts go
down by 10% and 1.5%, respectively.
If we sort the Frames in addition to the call stacks, then the cycle
and instruction counts go down by 14% and 1.6%, respectively, relative
to the same baseline (that is, without this patch).
|
|
This patch integrates CallStackRadixTreeBuilder into the V3 format,
reducing the profile size to about 27% of the V2 profile size.
- Serialization: writeMemProfCallStackArray just needs to write out
the radix tree array prepared by CallStackRadixTreeBuilder.
Mappings from CallStackIds to LinearCallStackIds are moved by new
function CallStackRadixTreeBuilder::takeCallStackPos.
- Deserialization: Deserializing a call stack is the same as
deserializing an array encoded in the obvious manner -- the length
followed by the payload, except that we need to follow a pointer to
the parent to take advantage of common prefixes once in a while.
This patch teaches LinearCallStackIdConverter to how to handle those
pointers.
|