aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/ProfileData/SampleProfReader.cpp
AgeCommit message (Collapse)AuthorFilesLines
2022-02-14[CSSPGO] Do not merge a context that is already duplicated into the base ↵Hongtao Yu1-1/+1
profile. Do not merge a context that is already duplicated into the base profile. Also fixing a typo caused by previous refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119735
2022-01-18[CSSPGO] Print "context-nested" instead of "preilnined" for ↵Hongtao Yu1-5/+4
ProfileSummarySection. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D117141
2021-12-14[CSSPGO] Use nested context-sensitive profile.Hongtao Yu1-29/+86
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts. For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned. In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed. A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum. ``` main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ``` Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile. I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1. Test Plan: Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D115205
2021-11-24[LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvmZarko Todorovski1-1/+1
Part of work to use more inclusive language in clang/llvm. Rewording some comments and change function and variable names.
2021-11-02[llvm-profdata] Print out section flags for FunctionMetadata sectionHongtao Yu1-0/+6
As titled. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D113064
2021-09-04Fix Wdocumentation unknown parameter warning. NFCI.Simon Pilgrim1-1/+1
2021-09-01[CSSPGO] Sort function offset table to speed up profile loading.Hongtao Yu1-30/+46
With the context split work, the context-based (an array of strings) sorting performed at profile load time is way more expansive than single-string-based sorting. This is likely due to auxiliary operations done on each array element, such as indirect references, std::min operations, also likely cache misses. In this change I'm presorting profiles during profile generation time to avoid sorting at compile time. Compared to the previous context-split work, this effectively cuts down compile time by 20% for one of our large services and brings us closer to non-CS build, with still a small gap in build time. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D109036
2021-09-01[CSSPGO] Enable loading MD5 CS profile.Hongtao Yu1-24/+33
Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster. There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D108342
2021-08-30[CSSPGO] Split context string to deduplicate function name used in the context.Hongtao Yu1-37/+103
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context. A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing. When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`. Testing This is no-diff change in terms of code quality and profile content (for text profile). For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated. The compile time of ads is dropped by 25%. Differential Revision: https://reviews.llvm.org/D107299
2021-08-25[CSSPGO] Add switch for sample loader to honor global pre-inliner decision ↵Wenlei He1-1/+1
from llvm-profgen The change adds a switch to allow sample loader to use global pre-inliner's decision instead. The pre-inliner in llvm-profgen makes inline decision globally based on whole program profile and function byte size as cost proxy. Since pre-inliner also adjusts/merges context profile based on its inline decision, honoring its inline decision in sample loader would lead to better post-inline profile quality especially for thinlto where cross module profile merging isn't possible without pre-inliner. Minor fix in profile reader is also included. When pre-inliner is use, we now also turn off the default merging and trimming logic unless it's explicitly asked. Differential Revision: https://reviews.llvm.org/D108677
2021-08-25[SampleFDO] Set ProfileIsFS bit properly from the internal optionRong Xu1-0/+3
We have "-profile-isfs" internal option for text, binary, and compactbinary format (mostly for debug and test purpose). We need to set the related flag in FunctionSamples so that ProfileIsFS is written to the header in extbinary format. Differential Revision: https://reviews.llvm.org/D108707
2021-08-16[SamplePGO][NFC] Dump function profiles in orderHongtao Yu1-2/+4
Sample profiles are stored in a string map which is basically an unordered map. Printing out profiles by simply walking the string map doesn't enforce an order. I'm sorting the map in the decreasing order of total samples to enable a more stable dump, which is good for comparing two dumps. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D108147
2021-08-16Reapply commit b7425e956Rong Xu1-1/+1
The commit b7425e956: [NFC] fix typos is harmless but was reverted by accident. Reapply.
2021-08-16Revert "[NFC] Fix typos"Kostya Kortchinsky1-1/+1
This reverts commit b7425e956be60a73004d7ae5bb37da85872c29fb.
2021-08-16[NFC] Fix typosRong Xu1-1/+1
s/senstive/senstive/g
2021-06-04[SampleFDO] New hierarchical discriminator for FS SampleFDO (llvm-profdata part)Rong Xu1-1/+9
This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is for llvm-profdata part of change. It sets the bit masks for the profile reader in llvm-profdata. Also add an internal option "-fs-discriminator-pass" for show and merge command to process the profile offline. This patch also moved setDiscriminatorMaskedBitFrom() to SampleProfileReader::create() to simplify the interface. Differential Revision: https://reviews.llvm.org/D103550
2021-06-02[SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part)Rong Xu1-4/+31
This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is mainly for ProfileData part of change. It will load FS Profile when such profile is detected. For an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. For other format profiles, the users need to use an internal option (-profile-isfs) to tell the compiler that the profile uses FS discriminators. This patch also simplified the bit API used by FS discriminators. Differential Revision: https://reviews.llvm.org/D103041
2021-04-16[SystemZ][z/OS] Add IsText Argument to GetFile and GetFileOrSTDINJonathan Crowther1-1/+1
Add the `IsText` argument to `GetFile` and `GetFileOrSTDIN` which will help z/OS distinguish between text and binary correctly. This is an extension to [this patch](https://reviews.llvm.org/D97785) Reviewed By: abhina.sreeskantharajan, amccarth Differential Revision: https://reviews.llvm.org/D100488
2021-03-18[CSSPGO] Add attribute metadata for context profileWenlei He1-24/+50
This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build. This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning. Differential Revision: https://reviews.llvm.org/D98823
2021-03-15[CSSPGO] Load context profile for external functions in PreLink and populate ↵Wenlei He1-2/+47
ThinLTO import list For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining. For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work: - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee. - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added. - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile). - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate. Differential Revision: https://reviews.llvm.org/D98590
2021-03-09[SampleFDO] Support enabling -funique-internal-linkage-name.Wei Mi1-11/+33
now -funique-internal-linkage-name flag is available, and we want to flip it on by default since it is beneficial to have separate sample profiles for different internal symbols with the same name. As a preparation, we want to avoid regression caused by the flip. When we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This kind of mismatch may introduce transient regression. To avoid such mismatch, we introduce a NameTable section flag indicating whether there is any name in the profile containing uniq suffix. Compiler will decide whether to keep uniq suffix during name canonicalization depending on the NameTable section flag. The flag is only available for extbinary format. For other formats, by default compiler will keep uniq suffix so they will only experience transient regression when -funique-internal-linkage-name is just flipped. Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed. Differential Revision: https://reviews.llvm.org/D96932
2021-02-05[CSSPGO] Use merged base profile for hot threshold calculationWenlei He1-5/+1
Context-sensitive profile effectively split a function profile into many copies each representing the CFG profile of a particular calling context. That makes the count distribution looks more flat as we now have more function profiles each with lower counts, which in turn leads to lower hot thresholds. Now we tells threshold computation to merge context profile first before calculating percentile based cutoffs to compensate for seemingly flat context profile. This can be controlled by swtich `sample-profile-contextless-threshold`. Earlier measurement showed ~0.4% perf boost with this tuning on spec2k6 for CSSPGO (with pseudo-probe and new inliner). Differential Revision: https://reviews.llvm.org/D95980
2021-02-01[CSSPGO] Tweaking inlining with pseudo probes.Hongtao Yu1-2/+7
Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites. Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95791
2021-01-27[CSSPGO] Support of CS profiles in extended binary format.Hongtao Yu1-40/+46
This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95547
2021-01-21[llvm] Use isDigit (NFC)Kazu Hirata1-1/+1
2021-01-19[SampleFDO] Add the support to split the function profiles with context intoWei Mi1-0/+7
separate sections. For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class. Differential Revision: https://reviews.llvm.org/D94435
2021-01-06[llvm] Use llvm::append_range (NFC)Kazu Hirata1-2/+2
2020-12-16[NFC][SampleFDO] Preparation to support multiple sections with the same typeWei Mi1-6/+14
in ExtBinary format. Currently ExtBinary format doesn't support multiple sections with the same type in the profile. We add the support in this patch. Previously we use the section type to identify a section uniquely. Now we introduces a LayoutIndex in the SecHdrTableEntry and use the LayoutIndex to locate the target section. The allocations of NameTable and FuncOffsetTable are adjusted accordingly. Currently it works as a NFC because it won't change anything for current layout. The test for multiple sections support will be included in another patch where a new type of profile containing multiple sections with the same type is introduced. Differential Revision: https://reviews.llvm.org/D93254
2020-12-16[CSSPGO] Consume pseudo-probe-based AutoFDO profileHongtao Yu1-13/+92
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`. Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites). **Adjustment to profile format ** A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like ``` !CFGChecksum: 12345 ``` is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums. Differential Revision: https://reviews.llvm.org/D92347
2020-12-08[SampleFDO] Store fixed length MD5 in NameTable instead of using ULEB128 ifWei Mi1-5/+51
MD5 is used. Currently during sample profile loading, NameTable has to be loaded entirely up front before any name string is retrieved. That is because NameTable is stored using ULEB128 encoding and cannot be directly accessed like an array. However, if MD5 is used to represent name in the NameTable, it has fixed length. If MD5 names are stored in uint64_t type instead of ULEB128, NameTable can be accessed like an array then in many cases only part of the NameTable has to be read. This is helpful for reducing compile time especially when small source file is compiled. We find that after this change, the elapsed time to build a large application distributively is reduced by 5% and the accumulative cpu time used for building is also reduced by 5%. The size of the profile is slightly reduced with this change by ~0.2%, and that also indicates encoding MD5 in ULEB128 doesn't save the storage space. Differential Revision: https://reviews.llvm.org/D92621
2020-12-06[CSSPGO] Infrastructure for context-sensitive Sample PGO and InliningWenlei He1-3/+18
This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon. Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO. **Interface** There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile. - Query base profile (`getBaseSamplesFor`) Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function. - Query context profile (`getContextSamplesFor`) Context profile is a function's CFG profile for a given calling context. We can query context profile by context string. - Track inlined context profile (`markContextSamplesInlined`) When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function. - Track not-inlined context profile (`promoteMergeContextSamplesTree`) When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination. **Implementation** Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state. On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes. **Integration** `SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`. Differential Revision: https://reviews.llvm.org/D90125
2020-10-22[NFC][SampleFDO] Move some common stuff from ↵Wei Mi1-11/+13
SampleProfileReaderExtBinary/WriterExtBinary to their parent classes. SampleProfileReaderExtBinary/SampleProfileWriterExtBinary specify the typical section layout currently used by SampleFDO. Currently a lot of section reader/writer stay in the two classes. However, as we expect to have more types of SampleFDO profiles, we hope those new types of profiles can share the common sections while configuring their own sections easily with minimal change. That is why I move some common stuff from SampleProfileReaderExtBinary/SampleProfileWriterExtBinary to SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase so new profiles class inheriting from the base class can reuse them. Differential Revision: https://reviews.llvm.org/D89524
2020-08-26[SampleFDO] Enhance profile remapping support for searching inline instanceWei Mi1-7/+11
and indirect call promotion candidate. Profile remapping is a feature to match a function in the module with its profile in sample profile if the function name and the name in profile look different but are equivalent using given remapping rules. This is a useful feature to keep the performance stable by specifying some remapping rules when sampleFDO targets are going through some large scale function signature change. However, currently profile remapping support is only valid for outline function profile in SampleFDO. It cannot match a callee with an inline instance profile if they have different but equivalent names. We found that without the support for inline instance profile, remapping is less effective for some large scale change. To add that support, before any remapping lookup happens, all the names in the profile will be inserted into remapper and the Key to the name mapping will be recorded in a map called NameMap in the remapper. During name lookup, a Key will be returned for the given name and it will be used to extract an equivalent name in the profile from NameMap. So with the help of the NameMap, we can translate any given name to an equivalent name in the profile if it exists. Whenever we try to match a name in the module to a name in the profile, we will try the match with the original name first, and if it doesn't match, we will use the equivalent name got from remapper to try the match for another time. In this way, the patch can enhance the profile remapping support for searching inline instance and searching indirect call promotion candidate. In a planned large scale change of int64 type (long long) to int64_t (long), we found the performance of a google internal benchmark degraded by 2% if nothing was done. If existing profile remapping was enabled, the performance degradation dropped to 1.2%. If the profile remapping with the current patch was enabled, the performance degradation further dropped to 0.14% (Note the experiment was done before searching indirect call promotion candidate was added. We hope with the remapping support of searching indirect call promotion candidate, the degradation can drop to 0% in the end. It will be evaluated post commit). Differential Revision: https://reviews.llvm.org/D86332
2020-05-10[gcov] Fix .gcda decoding and support GCC 8, 9 and 10Fangrui Song1-1/+1
GCDAProfiling.c unnecessarily writes function names to .gcda files. GCC 4.2 gcc/libgcov.c (now renamed to libgcc/libgcov*) did not write function names. gcov-7 (compatible) crashes on .gcda produced by libclang_rt.profile rL176173 realized the problem and introduced a mode to remove function names. llvm-cov code apparently takes GCDAProfiling.c output format as truth and tries to decode function names. Additionally, llvm-cov tries to decode tags in certain order which does not match libgcov emitted .gcda files. This patch fixes the .gcda decoder and makes it work with GCC 8 and 9 (10 is compatible with 9). Note, line statistics are broken and not fixed by this patch. Add test/tools/llvm-cov/gcov-{4.7,8,9}.c to test compatibility.
2020-04-07Recommit [SampleFDO] Add flag for partial profile.Wei Mi1-1/+32
Fix the error of show-prof-info.test on some platforms without zlib. The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them. Differential Revision: https://reviews.llvm.org/D77426
2020-04-07Revert "[SampleFDO] Add flag for partial profile." show-prof-info.test ↵Wei Mi1-32/+1
breaks on some platforms. This reverts commit e3ba652a1440794eff0b43ce747f1b0488585d22.
2020-04-07[SampleFDO] Add flag for partial profile.Wei Mi1-1/+32
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them. Differential Revision: https://reviews.llvm.org/D77426
2020-03-30[SampleFDO] Port MD5 name table support to extbinary format.Wei Mi1-18/+56
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it. Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support. Differential Revision: https://reviews.llvm.org/D76255
2020-02-10Revert "Remove redundant "std::move"s in return statements"Bill Wendling1-2/+2
The build failed with error: call to deleted constructor of 'llvm::Error' errors. This reverts commit 1c2241a7936bf85aa68aef94bd40c3ba77d8ddf2.
2020-02-10Remove redundant "std::move"s in return statementsBill Wendling1-2/+2
2020-01-28Make llvm::StringRef to std::string conversions explicit.Benjamin Kramer1-4/+4
This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2019-10-18[SampleFDO] Add profile remapping support for profile on-demand loading usedWei Mi1-37/+83
by ExtBinary format profile Profile on-demand loading was added for ExtBinary format profile in rL374233, but currently profile on-demand loading doesn't work well with profile remapping. The patch adds the support. Suppose a function in the current module has outline instance in the profile. The function name in the module is different from the name of the outline instance, but remapper knows the two names are equal. When loading profile on-demand, the outline instance has to be loaded with remapper's help. At the same time SampleProfileReaderItaniumRemapper is changed from a proxy of SampleProfileReader to a helper member in SampleProfileReader. Differential Revision: https://reviews.llvm.org/D68901 llvm-svn: 375295
2019-10-09[SampleFDO] Add indexing for function profiles so they can be loaded on demandWei Mi1-21/+81
in ExtBinary format Currently for Text, Binary and ExtBinary format profiles, when we compile a module with samplefdo, even if there is no function showing up in the profile, we have to load all the function profiles from the profile input. That is a waste of compile time. CompactBinary format profile has already had the support of loading function profiles on demand. In this patch, we add the support to load profile on demand for ExtBinary format. It will work no matter the sections in ExtBinary format profile are compressed or not. Experiment shows it reduces the time to compile a server benchmark by 30%. When profile remapping and loading function profiles on demand are both used, extra work needs to be done so that the loading on demand process will take the name remapping into consideration. It will be addressed in a follow-up patch. Differential Revision: https://reviews.llvm.org/D68601 llvm-svn: 374233
2019-10-07Fix build errors caused by rL373914.Wei Mi1-1/+2
llvm-svn: 373919
2019-10-07[SampleFDO] Add compression support for any section in ExtBinary profile formatWei Mi1-21/+63
Previously ExtBinary profile format only supports compression using zlib for profile symbol list. In this patch, we extend the compression support to any section. User can select some or all of the sections to compress. In an experiment, for a 45M profile in ExtBinary format, compressing name table reduced its size to 24M, and compressing all the sections reduced its size to 11M. Differential Revision: https://reviews.llvm.org/D68253 llvm-svn: 373914
2019-09-21Recommit [SampleFDO] Expose an interface to return the size of a sectionWei Mi1-0/+30
or the size of the profile for profile in ExtBinary format. Fix a test failure on Mac. [SampleFDO] Expose an interface to return the size of a section or the size of the profile for profile in ExtBinary format. Sometimes we want to limit the size of the profile by stripping some functions with low sample count or by stripping some function names with small text size from profile symbol list. That requires the profile reader to have the interfaces returning the size of a section or the size of total profile. The patch add those interfaces. At the same time, add some dump facility to show the size of each section. Differential revision: https://reviews.llvm.org/D67726 llvm-svn: 372478
2019-09-21Revert "[SampleFDO] Expose an interface to return the size of a section or ↵Amara Emerson1-30/+0
the size" This reverts commit f118852046a1d255ed8c65c6b5db320e8cea53a0. Broke the macOS build/greendragon bots. llvm-svn: 372464
2019-09-20[SampleFDO] Expose an interface to return the size of a section or the sizeWei Mi1-0/+30
of the profile for profile in ExtBinary format. Sometimes we want to limit the size of the profile by stripping some functions with low sample count or by stripping some function names with small text size from profile symbol list. That requires the profile reader to have the interfaces returning the size of a section or the size of total profile. The patch add those interfaces. At the same time, add some dump facility to show the size of each section. llvm-svn: 372439
2019-08-31[SampleFDO] Add profile symbol list section to discriminate function beingWei Mi1-0/+28
cold versus function being newly added. This is the second half of https://reviews.llvm.org/D66374. Profile symbol list is the collection of function symbols showing up in the binary which generates the current profile. It is used to discriminate function being cold versus function being newly added. Profile symbol list is only added for profile with ExtBinary format. During profile use compilation, when profile-sample-accurate is enabled, a function without profile will be regarded as cold only when it is contained in that list. Differential Revision: https://reviews.llvm.org/D66766 llvm-svn: 370563
2019-08-26[SampleFDO] Extract the code calling each section reader to readOneSection.Wei Mi1-20/+29
This is a followup of https://reviews.llvm.org/D66513. The code calling each section reader should be put into a separate function (readOneSection), so SampleProfileExtBinaryReader can override it. Otherwise, the base class SampleProfileExtBinaryBaseReader will need to be aware of all different kinds of section readers. That is not right. Differential Revision: https://reviews.llvm.org/D66693 llvm-svn: 369919