aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/ProfileData/InstrProfWriter.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-08-17[llvm] Remove unused includes (NFC) (#154051)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-08-08[InstrProf] Fix trace reservoir sampling (#152563)Ellis Hoag1-45/+18
`InstrProfWriter::addTemporalProfileTraces()` did not correctly account for when the sources traces are sampled, but the reservoir size is larger than what it was before, meaning there is room for more traces. Also, if the reservoir size decreased, meaning traces should be truncated. Depends on https://github.com/llvm/llvm-project/pull/152550 for the test refactor
2025-06-04[llvm] Remove unused includes (NFC) (#142733)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-29[MemProf] Summary section cleanup (NFC) (#142003)Teresa Johnson1-4/+1
Address post-commit review comments from PR141805. Misc cleanup but the biggest changes are moving some common utilities to new MemProfCommon files to reduce unnecessary includes.
2025-05-28[MemProf] Add basic summary section support (#141805)Teresa Johnson1-5/+13
This patch adds support for a basic MemProf summary section, which is built along with the indexed MemProf profile (e.g. when reading the raw or YAML profiles), and serialized through the indexed profile just after the header. Currently only 6 fields are written, specifically the number of contexts (total, cold, hot), and the max context size (cold, warm, hot). To support forwards and backwards compatibility for added fields in the indexed profile, the number of fields serialized first. The code is written to support forwards compatibility (reading newer profiles with additional summary fields), and comments indicate how to implement backwards compatibility (reading older profiles with fewer summary fields) as needed. Support is added to print the summary as YAML comments when displaying both the raw and indexed profiles via `llvm-profdata show`. Because they are YAML comments, the YAML reader ignores these (the summary is always recomputed when building the indexed profile as described above). This necessitated moving some options and a couple of interfaces out of Analysis/MemoryProfileInfo.cpp and into the new ProfileData/MemProfSummary.cpp file, as we need to classify context hotness earlier and also compute context ids to build the summary from older indexed profiles.
2025-05-24Re-apply "[StaticDataLayout][PGO]Implement reader and writer change for data ↵Mingming Liu1-2/+11
access profiles" (#141275) Re-apply https://github.com/llvm/llvm-project/pull/139997 after fixing the use-of-uninitialized-memory error (https://lab.llvm.org/buildbot/#/builders/94/builds/7373). Tested: The error is reproduced with https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_bootstrap_msan.sh without the fix, and test pass with the fix. **Original commit message:** https://github.com/llvm/llvm-project/pull/138170 introduces classes to operate on data access profiles. This change supports the read and write of `DataAccessProfData` in indexed format of MemProf (v4) as well as its the text (yaml) format. For indexed format: * InstrProfWriter owns (by `std::unique_ptr<DataAccessProfData>`) the data access profiles, and gives a non-owned copy when it calls `writeMemProf`. * MemProf v4 header has a new `uint64_t` to record the byte offset of data access profiles. This `uint64_t` field is zero if data access profile is not set (nullptr). * MemProfReader reads the offset from v4 header and de-serializes in-memory bytes into class `DataAccessProfData`. For textual format: * MemProfYAML.h adds the mapping for DAP class, and make DAP optional for both read and write. 099a0fa (by @snehasish) introduces v4 which contains CalleeGuids in CallSiteInfo, and this change changes the v4 format in place with data access profiles. The current plan is to bump the version and enable v4 profiles with both features, assuming waiting for this change won't delay the callsite change too long. --------- Co-authored-by: Kazu Hirata <kazu@google.com>
2025-05-22Revert "[StaticDataLayout][PGO]Implement reader and writer change for data ↵Mingming Liu1-11/+2
access profiles" (#141157) Reverts llvm/llvm-project#139997 Sanitizer failures (https://lab.llvm.org/buildbot/#/builders/94/builds/7373) Will fix forward later.
2025-05-22[StaticDataLayout][PGO]Implement reader and writer change for data access ↵Mingming Liu1-2/+11
profiles (#139997) https://github.com/llvm/llvm-project/pull/138170 introduces classes to operate on data access profiles. This change supports the read and write of `DataAccessProfData` in indexed format of MemProf (v4) as well as its the text (yaml) format. For indexed format: * InstrProfWriter owns (by `std::unique_ptr<DataAccessProfData>`) the data access profiles, and gives a non-owned copy when it calls `writeMemProf`. * MemProf v4 header has a new `uint64_t` to record the byte offset of data access profiles. This `uint64_t` field is zero if data access profile is not set (nullptr). * MemProfReader reads the offset from v4 header and de-serializes in-memory bytes into class `DataAccessProfData`. For textual format: * MemProfYAML.h adds the mapping for DAP class, and make DAP optional for both read and write. 099a0fa (by @snehasish) introduces v4 which contains CalleeGuids in CallSiteInfo, and this change changes the v4 format in place with data access profiles. The current plan is to bump the version and enable v4 profiles with both features, assuming waiting for this change won't delay the callsite change too long. --------- Co-authored-by: Kazu Hirata <kazu@google.com>
2025-05-19[NFC][MemProf] Move IndexedMemProfData to its own header. (#140503)Snehasish Kumar1-1/+0
Part of a larger refactoring with the following goals 1. Reduce the size of MemProf.h 2. Avoid including ModuleSummaryIndex just for a couple of types
2025-05-17[ProfileData] Use DenseMap::try_emplace (NFC) (#140394)Kazu Hirata1-8/+2
We can simplify the code with structured binding and try_emplace. Note that try_emplace default-constructs the value if omitted. FWIW, structured binding, a C++17 feature, wasn't available in our codebase at the time the code was written.
2025-05-15[NFC] One-liner clang-format (#140104)Mingming Liu1-3/+1
`InstrProfWriter::setOutputSparse` gets re-formatted when InstrProfWriter.cpp is modified. So formatted this line.
2025-04-23[memprof] Move writeMemProf to a separate file (#137051)Kazu Hirata1-282/+1
This patch moves writeMemProf and its subroutines to a separate file. The intent is as follows: - Reduce the size of InstrProfWriter.cpp. - Move the subroutines to a separate file because they don't interact with anything else in InstrProfWriter.cpp. Remarks: - The new file is named IndexedMemProfData.cpp without "Writer" in the name so that we can move the reader code to this file in the future. - This patch just moves code without changing the function signatures for now. It might make sense to implement a class encompassing "serialize" and "deserialize" methods for IndexedMemProfData, but that's left to subsequent patches.
2025-04-23[NFCI] Move ProfOStream from InstrProfWriter.cpp to InstrProf.h/cpp (#136791)Mingming Liu1-60/+0
ProfOStream is a wrapper class for output stream, and used by InstrProfWriter.cpp to serialize various profiles, like PGO profiles and MemProf. This change proposes to move it into InstrProf.h/cpp. After this is in, InstrProfWriter can dispatch serialization of various formats into methods like `obj->serialize()`, and the serialization code could be move out of InstrProfWriter.cpp into individual classes (each in a smaller cpp file). One example is that we can gradually move writeMemprof [1] into llvm/*/ProfileData/MemProf.h/cpp, where a couple of classes already have `serialize/deserialize` methods. [1] https://github.com/llvm/llvm-project/blob/85b35a90770b6053f91d79ca685cdfa4bf6499a4/llvm/lib/ProfileData/InstrProfWriter.cpp#L774-L791
2025-02-27[ProfileData] Avoid repeated hash lookups (NFC) (#129194)Kazu Hirata1-2/+3
2024-12-05[ProfileData] Add InstrProfWriter::writeBinaryIds (NFC) (#118754)Kazu Hirata1-34/+41
The patch makes InstrProfWriter::writeImpl less monolithic by adding InstrProfWriter::writeBinaryIds to serialize binary IDs. This way, InstrProfWriter::writeImpl can simply call the new function instead of handling all the details within writeImpl.
2024-12-04[PGO] Add option to always instrumenting loop entries (#116789)ronryvchin1-0/+7
This patch extends the PGO infrastructure with an option to prefer the instrumentation of loop entry blocks. This option is a generalization of https://github.com/llvm/llvm-project/commit/19fb5b467bb97f95eace1f3637d2d1041cebd3ce, and helps to cover cases where the loop exit is never executed. An example where this can occur are event handling loops. Note that change does NOT change the default behavior.
2024-11-24[memprof] Speed up llvm-profdata (#117446)Kazu Hirata1-1/+1
CallStackRadixTreeBuilder::build takes the parameter MemProfFrameIndexes by value, involving copies: std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>> MemProfFrameIndexes Then "build" makes another copy of MemProfFrameIndexe and passes it to encodeCallStack for every call stack, which is painfully slow. This patch changes the type to a pointer so that we don't have to make a copy every time we pass the argument. Without this patch, it takes 553 seconds to run "llvm-profdata merge" on a large MemProf raw profile. This patch shortenes that down to 67 seconds.
2024-11-24[memprof] Add an assert to InstrProfWriter::addMemProfData (#117426)Kazu Hirata1-3/+8
This patch adds a quick validity check to InstrProfWriter::addMemProfData. Specifically, we check to see if we have all (or none) of the MemProf profile components (frames, call stacks, records). The credit goes to Teresa Johnson for suggesting this assert.
2024-11-22[memprof] Remove MemProf format Version 1 (#117357)Kazu Hirata1-37/+0
This patch removes MemProf format Version 1 now that Version 2 and 3 are working well.
2024-11-20[MemProf] Templatize CallStackRadixTreeBuilder (NFC) (#117014)Teresa Johnson1-1/+1
Prepare for usage in the bitcode reader/writer where we already have a LinearFrameId: - templatize input frame id type in CallStackRadixTreeBuilder - templatize input frame id type in computeFrameHistogram - make the map from FrameId to LinearFrameId optional We plan to use the same radix format in the ThinLTO summary records, where we already have a LinearFrameId.
2024-11-19[ProfileData] Remove unused includes (NFC) (#116751)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2024-11-19[memprof] Add MemProfReader::takeMemProfData (#116769)Kazu Hirata1-1/+2
This patch adds MemProfReader::takeMemProfData, a function to return the complete MemProf profile from the reader. We can directly pass its return value to InstrProfWriter::addMemProfData without having to deal with the indivual components of the MemProf profile. The new function is named "take", but it doesn't do std::move yet because of type differences (DenseMap v.s. MapVector). The end state I'm trying to get to is roughly as follows: - MemProfReader accepts IndexedMemProfData as a parameter as opposed to the three individual components (frames, call stacks, and records). - MemProfReader keeps IndexedMemProfData as a class member without decomposing it into its individual components. - MemProfReader returns IndexedMemProfData like: IndexedMemProfData takeMemProfData() { return std::move(MemProfData); }
2024-11-18[memprof] Add InstrProfWriter::addMemProfData (#116528)Kazu Hirata1-0/+29
This patch adds InstrProfWriter::addMemProfData, which adds the complete MemProf profile (frames, call stacks, and records) to the writer context. Without this function, functions like loadInput in llvm-profdata.cpp and InstrProfWriter::mergeRecordsFromWriter must add one item (frame, call stack, or record) at a time. The new function std::moves the entire MemProf profile to the writer context if the destination is empty, which is the common use case. Otherwise, we fall back to adding one item at a time behind the scene. Here are a couple of reasons why we should add this function: - We've had a bug where we forgot to add one of the three data structures (frames, call stacks, and records) to the writer context, resulting in a nearly empty indexed profile. We should always package the three data structures together, especially on API boundaries. - We expose a little too much of the MemProf detail to InstrProfWriter. I'd like to gradually transform InstrProfReader/Writer to entities managing buffers (sequences of bytes), with actual serialization/deserialization left to external classes. We already do some of this in InstrProfReader, where InstrProfReader "contracts out" to IndexedMemProfReader to handle MemProf details. I am not changing loadInput or InstrProfWriter::mergeRecordsFromWriter for now because MemProfReader uses DenseMap for frames and call stacks, whereas MemProfData uses MapVector. I'll resolve these mismatches in subsequent patches.
2024-11-15[memprof] Remove MemProf format Version 0 (#116442)Kazu Hirata1-35/+0
This patch removes MemProf format Version 0 now that version 2 and 3 seem to be working well. I'm not touching version 1 for now because some tests still rely on version 1. Note that Version 0 is identical to Version 1 except that the MemProf section of the indexed format has a MemProf version field.
2024-11-14[memprof] Speed up caller-callee pair extraction (#116184)Kazu Hirata1-3/+16
We know that the MemProf profile has a lot of duplicate call stacks. Extracting caller-callee pairs from a call stack we've seen before is a wasteful effort. This patch makes the extraction more efficient by first coming up with a work list of linear call stack IDs -- the set of starting positions in the radix tree array -- and then extract caller-callee pairs from each call stack in the work list. We implement the work list as a bit vector because we expect the work list to be dense in the range [0, RadixTreeSize). Also, we want the set insertion to be cheap. Without this patch, it takes 25 seconds to extract caller-callee pairs from a large MemProf profile. This patch shortenes that down to 4 seconds.
2024-10-30[MemProf] Include <ctime> to avoid MSVC failure (#114246)Teresa Johnson1-0/+1
My change in bb3915149a7c9b1660db9caebfc96343352e8454 added a call to std::time which worked generally as there must be some transitive include of <ctime>. However, I saw one MSVC bot failure: InstrProfWriter.cpp(202): error C2039: 'time': is not a member of 'std' from https://lab.llvm.org/buildbot/#/builders/63/builds/2325. Presumably explictly including <ctime> should fix this.
2024-10-29[MemProf] Support for random hotness when writing profile (#113998)Teresa Johnson1-4/+38
Add support for generating random hotness in the memprof profile writer, to be used for testing. The random seed is printed to stderr, and an additional option enables providing a specific seed in order to reproduce a particular random profile.
2024-07-02[ProfileData] Use ArrayRef in PatchItem (NFC) (#97379)Kazu Hirata1-15/+16
Packaging an array and its size as ArrayRef in PatchItem allows us to get rid of things like std::size(Header) and HeaderOffsets.size().
2024-06-18[memprof] Rename the members of IndexedMemProfData (NFC) (#94873)Kazu Hirata1-25/+25
I'm planning to use IndexedMemProfData in MemProfReader and beyond. Before I do so, this patch renames the members of IndexedMemProfData as MemProfData.FrameData is a bit mouthful with "Data" repeated twice. Note that MemProfReader currently has a trio -- IdToFrame, CSIdToCallStack, and FunctionProfileData. Replacing them with an instance of IndexedMemProfData allows us to use the move semantics from the reader to the writer context. More importantly, treating the profile data as one package makes the maintenance easier. In the past, forgetting to update a place dealing with the trio has resulted in a bug where we totally forgot to emit call stacks into the indexed profile.
2024-06-18[ProfileData] Clean up validateRecord (#95488)Kazu Hirata1-7/+4
validateRecord ensures that all the values are unique except for IPVK_IndirectCallTarget and IPVK_VTableTarget. The problem is that we exclude them in the innermost loop. This patch pulls the loop invariant out of the loop. While I am at it, this patch migrates a use of getValueForSite to getValueArrayForSite.
2024-06-14[llvm] Use llvm::unique (NFC) (#95628)Kazu Hirata1-2/+1
2024-06-14[ProfileData] Migrate to getValueArrayForSite (#95493)Kazu Hirata1-7/+6
This patch migrates uses of getValueForSite to getValueArrayForSite. Each hunk is self-contained, meaning that each one can be applied independently of the others. In the unit test, there are cases where the array length check is performed a lot earlier than the array content check. For now, I'm leaving the length checks where they are. I'll consider moving them when I migrate uses of getNumValueDataForSite to getValueArrayForSite in a follow-up patch.
2024-06-07[memprof] Improve deserialization performance in V3 (#94787)Kazu Hirata1-7/+36
We call llvm::sort in a couple of places in the V3 encoding: - We sort Frames by FrameIds for stability of the output. - We sort call stacks in the dictionary order to maximize the length of the common prefix between adjacent call stacks. It turns out that we can improve the deserialization performance by modifying the comparison functions -- without changing the format at all. Both places take advantage of the histogram of Frames -- how many times each Frame occurs in the call stacks. - Frames: We serialize popular Frames in the descending order of popularity for improved cache locality. For two equally popular Frames, we break a tie by serializing one that tends to appear earlier in call stacks. Here, "earlier" means a smaller index within llvm::SmallVector<FrameId>. - Call Stacks: We sort the call stacks to reduce the number of times we follow pointers to parents during deserialization. Specifically, instead of comparing two call stacks in the strcmp style -- integer comparisons of FrameIds, we compare two FrameIds F1 and F2 with Histogram[F1] < Histogram[F2] at respective indexes. Since we encode from the end of the sorted list of call stacks, we tend to encode popular call stacks first. Since the two places use the same histogram, we compute it once and share it in the two places. Sorting the call stacks reduces the number of "jumps" by 74% when we deserialize all MemProfRecords. The cycle and instruction counts go down by 10% and 1.5%, respectively. If we sort the Frames in addition to the call stacks, then the cycle and instruction counts go down by 14% and 1.6%, respectively, relative to the same baseline (that is, without this patch).
2024-06-07[ProfileData] Add const to a few places (NFC) (#94803)Kazu Hirata1-3/+3
2024-06-07[memprof] Use CallStackRadixTreeBuilder in the V3 format (#94708)Kazu Hirata1-15/+7
This patch integrates CallStackRadixTreeBuilder into the V3 format, reducing the profile size to about 27% of the V2 profile size. - Serialization: writeMemProfCallStackArray just needs to write out the radix tree array prepared by CallStackRadixTreeBuilder. Mappings from CallStackIds to LinearCallStackIds are moved by new function CallStackRadixTreeBuilder::takeCallStackPos. - Deserialization: Deserializing a call stack is the same as deserializing an array encoded in the obvious manner -- the length followed by the payload, except that we need to follow a pointer to the parent to take advantage of common prefixes once in a while. This patch teaches LinearCallStackIdConverter to how to handle those pointers.
2024-05-31[memprof] Introduce memprof::LinearFrameId (NFC) (#94057)Kazu Hirata1-4/+6
This patch introduces memprof::LinearFrameId, which is a frame version of memprof::LinearCallStackId.
2024-05-31[memprof] Replace uint32_t with LinearCallStackId where appropriate (NFC) ↵Kazu Hirata1-8/+10
(#94023) This patch replaces uint32_t with LinearCallStackId where appropriate. I'm replacing uint64_t with LinearCallStackId in writeMemProfCallStackArray, but that's OK because it's a value to be used as LinearCallStackId anyway.
2024-05-30[memprof] Use linear IDs for Frames and call stacks (#93740)Kazu Hirata1-13/+77
With this patch, we stop using on-disk hash tables for Frames and call stacks. Instead, we'll write out all the Frames as a flat array while maintaining mappings from FrameIds to the indexes into the array. Then we serialize call stacks in terms of those indexes. Likewise, we'll write out all the call stacks as another flat array while maintaining mappings from CallStackIds to the indexes into the call stack array. One minor difference from Frames is that the indexes into the call stack array are not contiguous because call stacks are variable-length objects. Then we serialize IndexedMemProfRecords in terms of the indexes into the call stack array. Now, we describe each call stack with 32-bit indexes into the Frame array (as opposed to the 64-bit FrameIds in Version 2). The use of the smaller type cuts down the profile file size by about 40% relative to Version 2. The departure from the on-disk hash tables contributes a little bit to the savings, too. For now, IndexedMemProfRecords refer to call stacks with 64-bit indexes into the call stack array. As a follow-up, I'll change that to uint32_t, including necessary updates to RecordWriterTrait.
2024-05-29[memprof] Reorder MemProf sections in profile (#93640)Kazu Hirata1-11/+11
This patch teaches the V3 format to serialize Frames, call stacks, and IndexedMemProfRecords, in that order. I'm planning to use linear IDs for Frames. That is, Frames will be numbered 0, 1, 2, and so on in the order we serialize them. In turn, we will seialize the call stacks in terms of those linear IDs. Likewise, I'm planning to use linear IDs for call stacks and then serialize IndexedMemProfRecords in terms of those linear IDs for call stacks. With the new order, we can successively free data structures as we serialize them. That is, once we serialize Frames, we can free the Frames' data proper and just retain mappings from FrameIds to linear IDs. A similar story applies to call stacks.
2024-05-29[nfc][InstrProfWriter]Store header fields in a vector and back patch once ↵Mingming Liu1-46/+16
(#93594) This is a split of https://github.com/llvm/llvm-project/pull/93346 as discussed.
2024-05-28[memprof] Add MemProf format Version 3 (#93608)Kazu Hirata1-0/+52
This patch adds Version 3 for development purposes. For now, this patch adds V3 as a copy of V2. For the most part, this patch adds "case Version3:" wherever "case Version2:" appears. One exception is writeMemProfV3, which is copied from writeMemProfV2 but updated to write out memprof::Version3 to the MemProf header. We'll incrementally modify writeMemProfV3 in subsequent patches.
2024-05-22[nfc][InstrProfWriter]Wrap vtable writes in a method. (#93081)Mingming Liu1-28/+34
- This way `InstrProfWriter::writeImpl` itself is simpler.
2024-05-22[ProfileData] Use default member initializations (NFC) (#93120)Kazu Hirata1-9/+0
This patch uses default member initializations for all the fields in Header. The intent is to prevent accidental uninitialized fields and reduce the number of times we need to mention each member variable.
2024-05-18[nfc][InstrFDO]Encapsulate header writes in a class member function (#90142)Mingming Liu1-36/+34
The smaller class member are more focused and easier to maintain. This also paves the way for partial header forward compatibility in https://github.com/llvm/llvm-project/pull/88212 --------- Co-authored-by: Kazu Hirata <kazu@google.com>
2024-05-16[memprof] Update comments for writeMemProf and its helpers (#92446)Kazu Hirata1-13/+39
This patch adds comments for writeMemProf{V0,V1,V2} in a version-specific manner. The mostly repetitive nature of the comments is somewhat unfortunate but intentional to make it easy to retire older versions. Without this patch, the comment just before writeMemProf documents the Version1 format, which is very confusing.
2024-05-16[memprof] Group MemProf data structures into a struct (NFC) (#92360)Kazu Hirata1-52/+34
This patch groups the three Memprof data structures into a struct named IndexedMemProfData and teaches InstrProfWriter to use it. This way, we can pass IndexedMemProfData to writeMemProf and its helpers instead of individual data structures. As a follow-up, we can use the new struct in MemProfReader also. That in turn allows loadInput in llvm-profdata to move the MemProf data into the writer context, saving a few seconds for a large MemProf profile.
2024-05-15[InstrProf] Fix bug when clearing traces with samples (#92310)Ellis Hoag1-5/+6
The `--temporal-profile-max-trace-length=0` flag in the `llvm-profdata merge` command is used to remove traces from a profile. There was a bug where traces would not be cleared if the profile was already sampled. This patch fixes that.
2024-04-28[ProfileData] Use static_assert instead of assert (NFC)Kazu Hirata1-2/+2
Identified with misc-static-assert.
2024-04-25[memprof] Move getFullSchema and getHotColdSchema outside ↵Kazu Hirata1-4/+4
PortableMemInfoBlock (#90103) These functions do not operate on PortableMemInfoBlock. This patch moves them outside the class.
2024-04-24[memprof] Reduce schema for Version2 (#89876)Kazu Hirata1-9/+14
Curently, the compiler only uses several fields of MemoryInfoBlock. Serializing all fields into the indexed MemProf file simply wastes storage. This patch limits the schema down to four fields for Version2 by default. It retains the old behavior of serializing all fields via: llvm-profdata merge --memprof-version=2 --memprof-full-schema This patch reduces the size of the indexed MemProf profile I have by 40% (1.6GB down to 1.0GB).