riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
14 days	[llvm-profgen] Extend llvm-profgen to generate vtable profiles with data ↵	Mingming Liu	7	-27/+281
	access events for non context-sensitive profiles using debug info (#148013) An RFC is in https://discourse.llvm.org/t/rfc-vtable-type-profiling-for-samplefdo/87283 This change extends to process perf data with Intel [MEM_INST_RETIRED.ALL_LOADS](https://perfmon-events.intel.com/index.html?pltfrm=skylake_server.html&evnt=MEM_INST_RETIRED.ALL_LOADS) samples and produce sample profiles with vtable information for non context-sensitive SampleFDO profiles. * For feature parity across different hardwares, future work could incorporate support for AMD Instruction-Based Sampling (IBS) and Arm Statistical Profiling Extension (SPE). --------- Co-authored-by: Paschalis Mpeis <paschalis.mpeis@arm.com>
2025-09-09	[llvm-profgen] Add an option to mark all the profile context as preinlined ↵	Lei Wang	2	-0/+19
	(#156501) Add a new option (under `--mark-all-context-preinlined`) that marks all function samples with the `ContextShouldBeInlined ` attribute during post-processing to make the profile as preinlined. This can be useful for experiments outside of the CS preinliner, e.g. to fully replay the inlining for a given profile.
2025-09-08	MC: Add Triple overloads for more MC constructors (#157321)	Matt Arsenault	1	-12/+12
	Avoids more Triple->string->Triple round trip. This is a continuation of f137c3d592e96330e450a8fd63ef7e8877fc1908
2025-08-18	llvm-profgen: Options cleanup / fixes (#147632)	Matthias Braun	7	-68/+109
	- Add `cl::cat(ProfGenCategory)` to non-hidden options so they show up in `--help` output. - Introduce `Options.h` for options referenced in multiple files.
2025-08-18	llvm-profgen: Avoid "using namespace" in headers (#147631)	Matthias Braun	9	-38/+27
	Avoid global `using namespace` directives in headers as they are bad style.
2025-06-24	[llvm] fix extern cl::opt definitions for DLL export (#145374)	Andrew Rogers	1	-2/+0
	## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch ensures a few `cl::opt` declarations are properly annotated with `LLVM_ABI`. The annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). ## Overview - Remove local `extern` declarations of `llvm::PrintPipelinePasses` because it is already correctly declared with an `LLVM_ABI` annotation in `llvm\Passes\PassBuilder.h`. Leaving these declarations results in a gcc compile warning unless they are also annotated with `LLVM_ABI`. - Similarly, remove local `extern` declarations of `ProfileSummaryCutoffHot` and `UseContextLessSummary` from `llvm/tools/llvm-profgen/ProfileGenerator.cpp` since they are declared with `LLVM_ABI` in `llvm\ProfileData\ProfileCommon.h`. - Explicitly annotate the extern declaration of `ProfileCorrelate` in `clang/lib/CodeGen/BackendUtil.cpp` since it is not declared in a header. The definition of `ProfileCorrelate` in `llvm\lib\Transforms\Instrumentation\InstrProfiling.cpp` is already annotated with `LLVM_ABI`. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang
2025-04-28	Clean up external users of GlobalValue::getGUID(StringRef) (#129644)	Owen Rodley	1	-4/+5
	See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 for context. This is a non-functional change which just changes the interface of GlobalValue, in preparation for future functional changes. This part touches a fair few users, so is split out for ease of review. Future changes to the GlobalValue implementation can then be focused purely on that class. This does the following: * Rename GlobalValue::getGUID(StringRef) to getGUIDAssumingExternalLinkage. This is simply making explicit at the callsite what is currently implicit. * Where possible, migrate users to directly calling getGUID on a GlobalValue instance. * Otherwise, where possible, have them call the newly renamed getGUIDAssumingExternalLinkage, to make the assumption explicit. There are a few cases where neither of the above are possible, as the caller saves and reconstructs the necessary information to compute the GUID themselves. We want to migrate these callers eventually, but for this first step we leave them be.
2025-04-20	[llvm] Call hash_combine_range with ranges (NFC) (#136511)	Kazu Hirata	1	-3/+1

2025-03-20	[llvm] Use *Set::insert_range (NFC) (#132325)	Kazu Hirata	1	-2/+1
	DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.
2025-03-09	[llvm-profgen] Avoid repeated hash lookups (NFC) (#130466)	Kazu Hirata	1	-3/+3

2025-02-13	[llvm-profgen] Avoid repeated hash lookups (NFC) (#127028)	Kazu Hirata	1	-3/+4

2025-02-10	[llvm-profgen] Avoid repeated hash lookups (NFC) (#126467)	Kazu Hirata	1	-2/+3

2025-01-07	[CSSPGO]Add a flag to limit unsymbolized context depth (#121531)	Lei Wang	1	-1/+18
	Adding a new flag(`--csprof-max-unsymbolized-context-depth`) to only limit unsymbolized context depth. Currently,`--csprof-max-context-depth` applies to both symbolized and unsymbolized profile context, there are scenarios where `--csprof-max-context-depth` may not be flexible enough, e.g. if we want to limit the context but still keep all the inlinings from the leaf frame, we could set the value csprof-max-unsymbolized-context-depth >= 1.
2024-08-26	[MC][NFC] Reduce Address2ProbesMap size	Amir Ayupov	1	-5/+3
	Replace the map from addresses to list of probes with a flat vector containing probe references sorted by their addresses. Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from 9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary. Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ``` Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102904
2024-08-26	[MC][NFC] Statically allocate storage for decoded pseudo probes and function ↵	Amir Ayupov	2	-8/+8
	records Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`) and function records (`InlineTreeVec`). Leverage that to also shrink sizes of `MCDecodedPseudoProbe`: - Drop Guid since it's accessible via `InlineTree`. `MCDecodedPseudoProbeInlineTree`: - Keep track of probes and inlinees using `ArrayRef`s now that probes and function records belonging to the same function are allocated contiguously. This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with 400MiB .pseudo_probe section containing 43M probes and 25M function records. Depends on: #102774 #102787 #102788 Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102789
2024-08-23	[llvm] Use range-based for loops (NFC) (#105861)	Kazu Hirata	1	-2/+2

2024-08-10	[MC][profgen][NFC] Expand auto for MCDecodedPseudoProbe	Amir Ayupov	1	-1/+1
	Expand autos in select places in preparation to #102789. Reviewers: dcci, maksfb, WenleiHe, rafaelauler, ayermolo, wlei-llvm Reviewed By: WenleiHe, wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102788
2024-08-10	[profgen][NFC] Pass parameter as const_ref	Amir Ayupov	2	-4/+6
	Pass `ProbeNode` parameter of `trackInlineesOptimizedAway` as const reference. Reviewers: wlei-llvm, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/102787
2024-08-02	[llvm-profgen] Revert #99826 and #99026 (#100147)	Tim Creech	2	-141/+20
	Revert #99826 and #99026 to allow for additional input.
2024-07-22	[llvm-profgen] Add --sample-period to estimate absolute counts (#99826)	Tim Creech	1	-0/+14
	Without `--sample-period`, no assumptions are made about perf profile sample frequencies. This is useful for comparing relative hotness of different program locations within the same profile. With `--sample-period`, LBR- and IP-based profile hit counts are adjusted to estimate the absolute total event count for each program location. This makes it reasonable to compare hit counts between different profiles, e.g., between two LBR-based execution frequency profiles with different sampling periods or between LBR-based execution frequency profiles and IP-based branch mispredict profiles. This functionality is in support of HWPGO[^1], which aims to enable feedback from a wider range of hardware events. [^1]: https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf
2024-07-21	[llvm-profgen] Support creating profiles of arbitrary events (#99026)	Tim Creech	2	-20/+127
	This change introduces two options which may be used to create profiles of arbitrary PMU events. 1. `--leading-ip-only` provides a simple sample-IP-based profile mode. This is not useful for building a profile of execution frequency, but it is useful for building new types of profiles. For example, to build a profile of unpredictable branches: perf record -b -e branch-misses:upp -o perf.data ... llvm-profgen --perfdata perf.data --leading-ip-only ... 2. `--perf-event=event` enables the creation of a profile concerned with a specific event or set of events. The names given should match the "event" field as emitted by perf-script(1). This option has two spellings: `--perf-event` and `--perf-events`. The plural spelling accepts a comma-separated list. The singular spelling appends a single event name to the set of events which will be used. This is meant to accommodate event names containing commas. Combined, these options allow generating multiple kinds of profiles from a single `perf record` collection. For example, to generate both execution frequency and branch mispredict profiles: perf record -c 1000003 -b -e br_inst_retired.near_taken:upp,br_misp_retired.all_branches:upp ... llvm-profgen --output execution.prof --perf-event=br_inst_retired.near_taken:upp ... llvm-profgen --leading-ip-only --output unpredictable.prof --perf-event=br_misp_retired.all_branches:upp ... These additions are in support of more general HWPGO[^1], allowing feedback from a wider range of hardware events. [^1]: https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf --------- Co-authored-by: Tim Creech <tcreech@tcreech.com>
2024-07-09	[NFC] Coding style fixes: SampleProf (#98208)	Mircea Trofin	1	-1/+1
	Also some control flow simplifications. Notably, this doesn't address `sampleprof_error`. I think the style there tries to match `std::error_category`. Also left `hash_value` as-is, because it matches what we do in Hashing.h
2024-07-08	[MC][NFC] Fix typo in MCPseudoProbeFrameLocation (#98090)	Amir Ayupov	1	-1/+1

2024-07-07	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914)	Kazu Hirata	1	-5/+4

2024-06-13	[llvm-project] Fix typo "seperate" (#95373)	Jay Foad	1	-1/+1

2024-06-13	[llvm-profgen] Add support for Linux kenrel profile (#92831)	xur-llvm	4	-66/+123
	Add the support to handle Linux kernel perf files. The functionality is under option -kernel. Note that currently only main kernel (in vmlinux) is handled: kernel modules are not handled. --------- Co-authored-by: Han Shen <shenhan@google.com>
2024-05-24	[llvm-profgen] Improve sample profile density (#92144)	Lei Wang	2	-46/+110
	The profile density feature(the amount of samples in the profile relative to the program size) is used to identify insufficient sample issue and provide hints for user to increase sample count. A low-density profile can be inaccurate due to statistical noise, which can hurt FDO performance. This change introduces two improvements to the current density work. 1. The density calculation/definition is changed. Previously, the density of a profile was calculated as the minimum density for all warm functions (a function was considered warm if its total samples were within the top N percent of the profile). However, there is a problem that a high total sample profile can have a very low density, which makes the density value unstable. - Instead, we want to find a density number such that if a function's density is below this value, it is considered low-density function. We consider the whole profile is bad if a group of low-density functions have the sum of samples that exceeds N percent cut-off of the total samples. - In implementation, we sort the function profiles by density, iterate them in descending order and keep accumulating the body samples until the sum exceeds the (100% - N) percentage of the total_samples, the profile-density is the last(minimum) function-density of processed functions. We introduce the a flag(`--profile-density-threshold`) for this percentage threshold. 2. The density is now calculated based on final(compiler used) profiles instead of merged context-less profiles.
2024-05-24	[llvm-profgen] Trim tail CR+LF for LBR record line (#93210)	Haohai Wen	1	-1/+1
	On Windows, perfscript generated by sep contains CR+LF at the end of LBR records line. This '\r' will be treated as a LBR record when running llvm-profgen on Linux and then generate warning.
2024-04-11	[llvm-profgen] Remove temporary perf script files (#86668)	Haohai Wen	2	-0/+14
	The temporary perf script files converted from perf data will occupy lots of space for large project. This patch removes them when llvm-profgen exits normally or receives signals.
2024-03-15	[llvm-profgen] Support COFF binary (#83972)	Haohai Wen	3	-25/+63
	Intel Vtune/SEP has supported collecting LBR on Windows and generating perf-script file which is same format as Linux perf script. This patch teaches llvm-profgen to disassemble COFF binary so that we can do Sampling based PGO on Windows.
2024-02-29	llvm-profgen: Fix race condition (#83489)	Matthias Braun	1	-3/+8
	Fix race condition when multiple instances of `llvm-progen` read from the same inputs.
2024-02-16	[llvm-profgen] Filter out ambiguous cold profiles during profile generation ↵	Lei Wang	2	-0/+52
	(#81803) For the built-in local initialization function(`__cxx_global_var_init`, `__tls_init` prefix), there could be multiple versions of the functions in the final binary, e.g. `__cxx_global_var_init`, which is a wrapper of global variable ctors, the compiler could assign suffixes like `__cxx_global_var_init.N` for different ctors. However, in the profile generation, we call `getCanonicalFnName` to canonicalize the names which strip the suffixes. Therefore, samples from different functions queries the same profile(only `__cxx_global_var_init`) and the counts are merged. As the functions are essentially different, entries of the merged profile are ambiguous. In sample loading, for each version of this function, the IR from one version would be attributed towards a merged entries, which is inaccurate, especially for fuzzy profile matching, it gets multiple callsites(from different function) but using to match one callsite, which mislead the matching and report a lot of false positives. Hence, we want to filter them out from the profile map during the profile generation time. The profiles are all cold functions, it won't have perf impact.
2024-01-30	[profgen] Use a 64bit integer for &'ing the loadable address (#79930)	Nathan Lanza	1	-1/+1
	For the linux kernel, the loadable segments start at 0xffff... and thus the 32 bit integer here was truncating all the meaningful bits. Grow it to 64 bits.
2023-12-24	[ProfileData] Copy CallTargetMaps a bit less. NFCI	Benjamin Kramer	1	-3/+2

2023-12-11	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)	Kazu Hirata	1	-6/+6
	This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.
2023-12-03	[llvm] Stop including vector (NFC)	Kazu Hirata	2	-2/+0
	Identified with clangd.
2023-12-02	[llvm] Stop including list (NFC)	Kazu Hirata	2	-2/+0
	Identified with clangd.
2023-10-22	[llvm-profgen] More tweaks to warnings (#68608)	Hongtao Yu	2	-16/+36
	Tweaking warnings more to avoid flooding user log.
2023-10-17	[llvm-profdata] Do not create numerical strings for MD5 function names read ↵	William Junda Huang	7	-35/+56
	from a Sample Profile. (#66164) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.org/D147740 In previous implementation, when a MD5 Sample Profile is read, the reader first converts the MD5 values to strings, and then create a StringRef as if the numerical strings are regular function names, and later on IPO transformation passes perform string comparison over these numerical strings for profile matching. This is inefficient since it causes many small heap allocations. In this patch I created a class `ProfileFuncRef` that is similar to `StringRef` but it can represent a hash value directly without any conversion, and it will be more efficient (I will attach some benchmark results later) when being used in associative containers. ProfileFuncRef guarantees the same function name in string form or in MD5 form has the same hash value, which also fix a few issue in IPO passes where function matching/lookup only check for function name string, while returns a no-match if the profile is MD5. When testing on an internal large profile (> 1 GB, with more than 10 million functions), the full profile load time is reduced from 28 sec to 25 sec in average, and reading function offset table from 0.78s to 0.7s
2023-10-03	[llvm-profgen] Print DWP related warnings under show-detailed-warning (#68019)	Hongtao Yu	1	-7/+14
	Printing DWP related warnings under show-detailed-warning so that they won't flood user log.
2023-09-30	[profiling] Move option declarations into headers	Tom Stellard	1	-7/+1
	This will make it possible to add visibility attributes to these variables. This also fixes some type mismatches between the declaration and the definition. Reviewed By: bogner, huangjd Differential Revision: https://reviews.llvm.org/D156599
2023-09-18	[llvm-profgen] Ignore inline frames with an emtpy function name (#66678)	Hongtao Yu	1	-1/+1
	Broken debug information can give empty names for an inlined frame, e.g, ``` 0x1d605c68: ryKeyINS7_17SmartCounterTypesEEESt10shared_ptrINS7_15AsyncCacheValueIS9_EEESaIhESt6atomicEEE9fetch_subElSt12memory_order at Filename: edata.h Function start filename: edata.h Function start line: 266 Function start address: 0x1d605c68 Line: 267 Column: 0 (inlined by) at Filename: edata.h Function start filename: edata.h Function start line: 274 Function start address: 0x1d605c68 Line: 275 Column: 0 (inlined by) _EEEmmEv at Filename: arena.c Function start filename: arena.c Function start line: 1303 Line: 1308 Column: 0 ``` This patch avoids creating a sample context with an empty function name by stopping tracking at that frame. This prevents a hash failure that leads to an ICE, where empty context serves at an empty key for the underlying MapVector https://github.com/llvm/llvm-project/blob/7624de5beae2f142abfdb3e32a63c263a586d768/llvm/lib/ProfileData/SampleProfWriter.cpp#L261
2023-09-01	[llvm] Fix duplicate word typos. NFC	Fangrui Song	1	-1/+1
	Those fixes were taken from https://reviews.llvm.org/D137338
2023-08-30	[NFC] Remove unused variables declared in conditions	Takuya Shimizu	1	-1/+1
	D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}` This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch. Differential Revision: https://reviews.llvm.org/D158016
2023-08-17	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build ↵	William Huang	1	-13/+6
	speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740
2023-07-28	Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO ↵	Aaron Ballman	1	-6/+13
	build speed using MD5 as key to Sample Profile map" This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292. Addressing issues found by: https://lab.llvm.org/buildbot/#/builders/245/builds/11732 https://lab.llvm.org/buildbot/#/builders/187/builds/12251 https://lab.llvm.org/buildbot/#/builders/186/builds/11099 https://lab.llvm.org/buildbot/#/builders/182/builds/6976
2023-07-27	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build ↵	William Huang	1	-13/+6
	speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740
2023-06-27	[CSSPGO][Preinliner] Always inline zero-sized functions.	Hongtao Yu	1	-0/+7
	Zero-sized functions should be cost-free in term of size budget, so they should be considered during inlining even if we run out of size budget. This appears to give 0.5% win for one of our internal services. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D153820
2023-06-27	Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO ↵	Haojian Wu	1	-6/+13
	build speed using MD5 as key to Sample Profile map" This reverts commit 12e9c7aaa66b7624b5d7666ce2794d912bf9e4b7. The commit has broken the buildbot, see comment https://reviews.llvm.org/D147740#4451540
2023-06-26	[CSSPGO][Preinliner] Bump up the threshold to favor previous compiler inline ↵	Hongtao Yu	1	-2/+15
	decision. The compiler has more insight and knowledge about functions based on their IR and attribures and should make a better inline decision than the offline preinliner does which is purely based on callsites hotness and code size. Therefore I'm making changes to favor previous compiler inline decision by bumping up the callsite allowance. This should improve the performance by more than 1% according to testing on Meta services. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D153797