riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
13 days	[llvm-profgen] Extend llvm-profgen to generate vtable profiles with data ↵	Mingming Liu	1	-5/+50
	access events for non context-sensitive profiles using debug info (#148013) An RFC is in https://discourse.llvm.org/t/rfc-vtable-type-profiling-for-samplefdo/87283 This change extends to process perf data with Intel [MEM_INST_RETIRED.ALL_LOADS](https://perfmon-events.intel.com/index.html?pltfrm=skylake_server.html&evnt=MEM_INST_RETIRED.ALL_LOADS) samples and produce sample profiles with vtable information for non context-sensitive SampleFDO profiles. * For feature parity across different hardwares, future work could incorporate support for AMD Instruction-Based Sampling (IBS) and Arm Statistical Profiling Extension (SPE). --------- Co-authored-by: Paschalis Mpeis <paschalis.mpeis@arm.com>
2025-09-08	MC: Add Triple overloads for more MC constructors (#157321)	Matt Arsenault	1	-12/+12
	Avoids more Triple->string->Triple round trip. This is a continuation of f137c3d592e96330e450a8fd63ef7e8877fc1908
2025-08-18	llvm-profgen: Options cleanup / fixes (#147632)	Matthias Braun	1	-15/+20
	- Add `cl::cat(ProfGenCategory)` to non-hidden options so they show up in `--help` output. - Introduce `Options.h` for options referenced in multiple files.
2025-08-18	llvm-profgen: Avoid "using namespace" in headers (#147631)	Matthias Braun	1	-0/+1
	Avoid global `using namespace` directives in headers as they are bad style.
2025-04-28	Clean up external users of GlobalValue::getGUID(StringRef) (#129644)	Owen Rodley	1	-4/+5
	See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 for context. This is a non-functional change which just changes the interface of GlobalValue, in preparation for future functional changes. This part touches a fair few users, so is split out for ease of review. Future changes to the GlobalValue implementation can then be focused purely on that class. This does the following: * Rename GlobalValue::getGUID(StringRef) to getGUIDAssumingExternalLinkage. This is simply making explicit at the callsite what is currently implicit. * Where possible, migrate users to directly calling getGUID on a GlobalValue instance. * Otherwise, where possible, have them call the newly renamed getGUIDAssumingExternalLinkage, to make the assumption explicit. There are a few cases where neither of the above are possible, as the caller saves and reconstructs the necessary information to compute the GUID themselves. We want to migrate these callers eventually, but for this first step we leave them be.
2025-03-20	[llvm] Use *Set::insert_range (NFC) (#132325)	Kazu Hirata	1	-2/+1
	DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.
2024-08-26	[MC][NFC] Statically allocate storage for decoded pseudo probes and function ↵	Amir Ayupov	1	-5/+5
	records Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`) and function records (`InlineTreeVec`). Leverage that to also shrink sizes of `MCDecodedPseudoProbe`: - Drop Guid since it's accessible via `InlineTree`. `MCDecodedPseudoProbeInlineTree`: - Keep track of probes and inlinees using `ArrayRef`s now that probes and function records belonging to the same function are allocated contiguously. This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with 400MiB .pseudo_probe section containing 43M probes and 25M function records. Depends on: #102774 #102787 #102788 Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102789
2024-08-23	[llvm] Use range-based for loops (NFC) (#105861)	Kazu Hirata	1	-2/+2

2024-08-10	[profgen][NFC] Pass parameter as const_ref	Amir Ayupov	1	-1/+2
	Pass `ProbeNode` parameter of `trackInlineesOptimizedAway` as const reference. Reviewers: wlei-llvm, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/102787
2024-07-07	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914)	Kazu Hirata	1	-5/+4

2024-06-13	[llvm-profgen] Add support for Linux kenrel profile (#92831)	xur-llvm	1	-0/+7
	Add the support to handle Linux kernel perf files. The functionality is under option -kernel. Note that currently only main kernel (in vmlinux) is handled: kernel modules are not handled. --------- Co-authored-by: Han Shen <shenhan@google.com>
2024-03-15	[llvm-profgen] Support COFF binary (#83972)	Haohai Wen	1	-19/+38
	Intel Vtune/SEP has supported collecting LBR on Windows and generating perf-script file which is same format as Linux perf script. This patch teaches llvm-profgen to disassemble COFF binary so that we can do Sampling based PGO on Windows.
2024-01-30	[profgen] Use a 64bit integer for &'ing the loadable address (#79930)	Nathan Lanza	1	-1/+1
	For the linux kernel, the loadable segments start at 0xffff... and thus the 32 bit integer here was truncating all the meaningful bits. Grow it to 64 bits.
2023-10-22	[llvm-profgen] More tweaks to warnings (#68608)	Hongtao Yu	1	-15/+31
	Tweaking warnings more to avoid flooding user log.
2023-10-17	[llvm-profdata] Do not create numerical strings for MD5 function names read ↵	William Junda Huang	1	-3/+12
	from a Sample Profile. (#66164) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.org/D147740 In previous implementation, when a MD5 Sample Profile is read, the reader first converts the MD5 values to strings, and then create a StringRef as if the numerical strings are regular function names, and later on IPO transformation passes perform string comparison over these numerical strings for profile matching. This is inefficient since it causes many small heap allocations. In this patch I created a class `ProfileFuncRef` that is similar to `StringRef` but it can represent a hash value directly without any conversion, and it will be more efficient (I will attach some benchmark results later) when being used in associative containers. ProfileFuncRef guarantees the same function name in string form or in MD5 form has the same hash value, which also fix a few issue in IPO passes where function matching/lookup only check for function name string, while returns a no-match if the profile is MD5. When testing on an internal large profile (> 1 GB, with more than 10 million functions), the full profile load time is reduced from 28 sec to 25 sec in average, and reading function offset table from 0.78s to 0.7s
2023-10-03	[llvm-profgen] Print DWP related warnings under show-detailed-warning (#68019)	Hongtao Yu	1	-7/+14
	Printing DWP related warnings under show-detailed-warning so that they won't flood user log.
2023-09-18	[llvm-profgen] Ignore inline frames with an emtpy function name (#66678)	Hongtao Yu	1	-1/+1
	Broken debug information can give empty names for an inlined frame, e.g, ``` 0x1d605c68: ryKeyINS7_17SmartCounterTypesEEESt10shared_ptrINS7_15AsyncCacheValueIS9_EEESaIhESt6atomicEEE9fetch_subElSt12memory_order at Filename: edata.h Function start filename: edata.h Function start line: 266 Function start address: 0x1d605c68 Line: 267 Column: 0 (inlined by) at Filename: edata.h Function start filename: edata.h Function start line: 274 Function start address: 0x1d605c68 Line: 275 Column: 0 (inlined by) _EEEmmEv at Filename: arena.c Function start filename: arena.c Function start line: 1303 Line: 1308 Column: 0 ``` This patch avoids creating a sample context with an empty function name by stopping tracking at that frame. This prevents a hash failure that leads to an ICE, where empty context serves at an empty key for the underlying MapVector https://github.com/llvm/llvm-project/blob/7624de5beae2f142abfdb3e32a63c263a586d768/llvm/lib/ProfileData/SampleProfWriter.cpp#L261
2023-08-30	[NFC] Remove unused variables declared in conditions	Takuya Shimizu	1	-1/+1
	D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}` This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch. Differential Revision: https://reviews.llvm.org/D158016
2023-06-23	[llvm-profgen] Remove target triple check to allow for more targets	Hongtao Yu	1	-3/+1
	Llvm-profgen internally uses the llvm libraries and the MCDesc interface to do disassembling and symblization and it never checks against target-specific instruction operators. This makes it quite transparent to targets and a first attempt for an aarch64 binary just works. Therefore I'm removing the unnecessary triple check to unblock for new targets. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D153449
2023-05-25	Avoid pointless canonicalize when using Dwarf names	Mark Santaniello	1	-5/+6
	CPU profile indicated memcmp was hot due to the two rfind calls in getCanonicalFnName. If UseSymbolTable is false, we can avoid the cost entirely. For CSSPGO profiles I've measured ~5% speedup with this change. Profile similarity before/after matches 100%. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D151441
2023-02-07	[NFC][TargetParser] Remove llvm/ADT/Triple.h	Archibald Elliott	1	-1/+1
	I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.
2023-01-16	[llvm-objdump][RISCV] Use new common method to parse ARCH RISCV attribute	Elena Lepilkina	1	-2/+4
	Differential Revision: https://reviews.llvm.org/D139553
2022-12-17	std::optional::value => operator*/operator->	Fangrui Song	1	-3/+3
	value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.
2022-12-16	[CSSPGO][llvm-profgen] Missing frame inference.	Hongtao Yu	1	-7/+64
	This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call elimination (TCE) but could also be extended to supporting other scenarios like frame pointer omission. When a tail called function is sampled, the caller frame will be missing from the call chain because the caller frame is reused for the callee frame. While TCE is beneficial to both perf and reducing stack overflow, a workaround being made in this change aims to find back the missing frames as much as possible. The idea behind this work is to build a dynamic call graph that consists of only tail call edges constructed from LBR samples and DFS-search for a unique path for a given source frame and target frame on the graph. The unique path will be used to fill in the missing frames between the source and target. Note that only a unique path counts. Multiple paths are treated unreachable since we don't want to overcount for any particular possible path. A switch --infer-missing-frame is introduced and defaults to be on. Some testing results: - 0.4% perf win according to three internal benchmarks. - About 2/3 of the missing tail call frames can be recovered, according to an internal benchmark. - 10% more profile generation time. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D139367
2022-12-05	[DebugInfo] llvm::Optional => std::optional	Fangrui Song	1	-1/+1
	https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26	[llvm-profgen] Use std::optional in ProfiledBinary.cpp (NFC)	Kazu Hirata	1	-1/+2
	This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-10-27	[PseudoProbe] Replace relocation with offset for entry probe.	Hongtao Yu	1	-12/+51
	Currently pseudo probe encoding for a function is like: - For the first probe, a relocation from it to its physical position in the code body - For subsequent probes, an incremental offset from the current probe to the previous probe The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address. A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name. Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function ``` Foo() { … Probe 1 … Probe 2 } ``` If it is transformed into two binary functions: ``` Foo: … Foo.outlined: … ``` The encoding for the two binary functions will be separate: ``` GUID of Foo Probe 1 GUID of Foo Sentinel probe of Foo.outlined Probe 2 ``` Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address. On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues. The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding. Reviewed By: wenlei, maksfb Differential Revision: https://reviews.llvm.org/D135912 Differential Revision: https://reviews.llvm.org/D135914 Differential Revision: https://reviews.llvm.org/D136394
2022-10-25	[llvm-profgen] Do not cache the frame location stack during computing ↵	wlei	1	-5/+6
	inlined context size In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cache the frame location stack. Measured on one internal service binary, this can save 2GB memory usage and reduce a small run time (avoid one hash search). Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D128859
2022-10-13	[llvm-profgen] Fix inconsistent loading address issues	wlei	1	-65/+54
	This is to fix two issues related with loading address: 1) When multiple MMAPs occur and their loading address are different, before it only used the first MMap as base address, all perf address after it used the wrong base address. 2) For pseudo probe profile, the address is always based on preferred loading address. If the base address is not equal to the preferred loading address, the pseudo probe address query will be wrong. Solution: Instead of converting the address to offset lazily, right now all the address after parsing are converted on the fly based on preferred loading address in the parsing time. There is no "offset" used in profile generator any more. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D126827
2022-07-13	[llvm] Use value instead of getValue (NFC)	Kazu Hirata	1	-3/+3

2022-06-27	[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie	wlei	1	-9/+7
	This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner. One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node. Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic. Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D127031
2022-06-25	[llvm] Don't use Optional::hasValue (NFC)	Kazu Hirata	1	-6/+5
	This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.
2022-06-25	Revert "Don't use Optional::hasValue (NFC)"	Kazu Hirata	1	-8/+9
	This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
2022-06-25	Don't use Optional::hasValue (NFC)	Kazu Hirata	1	-9/+8

2022-06-05	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	Fangrui Song	1	-4/+2

2022-06-03	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC	Fangrui Song	1	-4/+4
	Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.
2022-05-12	[llvm-profgen] Filter out oversized LBR ranges.	Hongtao Yu	1	-3/+8
	As a follow up to {D123271}, LBR ranges that are too big should also be considered as invalid. For example, the last two pairs in the following trace form a range [0x0d7b02b0, 0x368ba706] that covers a ton of functions in the binary. Such oversized range should also be ignored. 0x0c74505f/0x368b99a0 0x368ba706/0x0c745040 0x0d7b1c3f/0x0d7b02b0 Add a defensive check to filter out those ranges based that the valid range should not cross the unconditional branch(Call, return, unconditional jmp). Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D125448
2022-04-28	[llvm-profgen] Decouple artificial branch from LBR parser and fix external ↵	wlei	1	-0/+2
	address related issues This patch is fixing two issues for both CS and non-CS. 1) For external-call-internal, the head samples of the the internal function should be recorded. 2) avoid ignoring LBR after meeting the interrupt branch for CS profile LBR parser is shared between CS and non-CS, we found it's error-prone while dealing with artificial branch inside LBR parser. Since artificial branch is mainly used for CS profile unwinding, this patch tries to simplify LBR parser by decoupling artificial branch code from it, the concept of artificial branch is removed and split into two transitional branches(internal-to-external, external-to-internal). Then we leave all the processing of external branch to unwinder. Specifically for unwinder, remembering that we introduce external frame in https://reviews.llvm.org/D115550. We can just take external address as a regular address and reuse current unwind function(unwindCall, unwindReturn). For a normal case, the external frame will match an external LBR, and it will be filtered out by `unwindLinear` without losing any context. The data also shows that the interrupt or standalone LBR pattern(unpaired case) does exist, we choose to handle it by clearing the call stack and keeping unwinding. Here we leverage checking in `unwindLinear`, because a standalone LBR, no matter its type, since it doesn’t have other part to pair, it will eventually cause a wrong linear range, like [external, internal], [internal, external]. Then set the state to invalid there. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D118177
2022-03-23	[llvm-profgen] Decoding pseudo probe for profiled function only.	Hongtao Yu	1	-11/+50
	Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in profile generation. I'm changing the full decoding mode to be decoding for profiled functions only, though we still do a full scan of the .pseudoprobe section due to a missing table-of-content but we don't have to build the in-memory data structure for functions not sampled. To build the in-memory data structure for profiled functions only, I'm rewriting the previous non-recursive probe decoding logic to be recursive. This is easy to read and maintain. I also have to change the previous representation of unsymbolized context from probe-based stack to address-based stack since the profiled functions are unknown yet by the time of virtual unwinding. The address-based stack will be converted to probe-based stack after virtual unwinding and on-demand probe decoding. I'm seeing 20GB memory is saved for one of our internal large service. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D121643
2022-02-24	Cleanup include: DebugInfo/Symbolize	serge-sans-paille	1	-0/+1
	Estimation of the impact on preprocessor output after: 1067349756 before:1067487786 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120433
2022-02-23	[llvm-profgen] Support symbol loading for debug fission	wlei	1	-54/+85
	Support to load debug info from dwarf split file, like .dwo, .dwp files. Leverage the `getNonSkeletonUnitDIE(false)` API to achieve this. Add test cause to make sure all the ranges is well retrieved by the loader. Reviewed By: ayermolo, hoy, wenlei Differential Revision: https://reviews.llvm.org/D115973
2022-02-08	[llvm-profgen] On-demand track optimized-away inlinees for preinliner.	Hongtao Yu	1	-4/+30
	Tracking optimized-away inlinees based on all probes in a binary is expansive in terms of memory usage I'm making the tracking on-demand based on profiled functions only. This saves about 10% memory overall for a medium-sized benchmark. Before: note: After parsePerfTraces note: Thu Jan 27 18:42:09 2022 note: VM: 8.68 GB RSS: 8.39 GB note: After computeSizeForProfiledFunctions note: Thu Jan 27 18:42:41 2022 note: VM: 10.63 GB RSS: 10.20 GB note: After generateProbeBasedProfile note: Thu Jan 27 18:45:49 2022 note: VM: 25.00 GB RSS: 24.95 GB note: After postProcessProfiles note: Thu Jan 27 18:49:29 2022 note: VM: 26.34 GB RSS: 26.27 GB After: note: After parsePerfTraces note: Fri Jan 28 12:04:49 2022 note: VM: 8.68 GB RSS: 7.65 GB note: After computeSizeForProfiledFunctions note: Fri Jan 28 12:05:26 2022 note: VM: 8.68 GB RSS: 8.42 GB note: After generateProbeBasedProfile note: Fri Jan 28 12:08:03 2022 note: VM: 22.93 GB RSS: 22.89 GB note: After postProcessProfiles note: Fri Jan 28 12:11:30 2022 note: VM: 24.27 GB RSS: 24.22 GB This should be a no-diff change in terms of profile quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D118515
2022-02-02	[llvm-profgen] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr	Simon Pilgrim	1	-2/+2
	The pointers are dereferenced immediately, so assert the cast is correct instead of returning nullptr
2022-01-24	[llvm-profgen] Support to load debug info from a second binary	wlei	1	-5/+14
	For reducing binary size purpose, the binary's debug info and executable segment can be separated(like using objcopy --only-keep-debug). Here add support in llvm-profgen to use two binaries as input. The original one is executable binary and added for debug info only binary. Adding a flag `--debug-binary=file-path`, with this, the binary will load debug info from debug binary. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115948
2022-01-14	[llvm-profgen] ProfiledBinary::load - use cast<> instead of dyn_cast<> to ↵	Simon Pilgrim	1	-1/+1
	avoid dereference of nullptr The pointer is always dereferenced immediately, so assert the cast is correct instead of returning nullptr
2021-12-14	[llvm-profgen] Skip disassembling for PLT section	wlei	1	-1/+5
	Skip disassembling .plt section, then .plt section code will be treated as external code. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115699
2021-12-08	[llvm-profgen] Fix total samples related issues	wlei	1	-5/+1
	Since total sample and body sample are used to compute hotness threshold in compiler, we found in some services changing the total samples computation will cause noticeable regression. Hence, here we will revert the changes and just keep all total samples number identical to the old tool. Three changes in this diff: 1. Revert previous diff(https://reviews.llvm.org/D112672: [llvm-profgen] Update total samples by accumulating all its body samples) and put it under a switch. 2. Keep the negative line number. Although compiler doesn't consume the count but it will be used to compute hot threshold. 3. Change to accumulate total samples per byte instead of per instruction. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115013
2021-11-30	[FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator	wlei	1	-1/+24
	In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen: 1) Dissassemble .rodata section to check if FS discriminator var ('"__llvm_fs_discriminator__"') exists and set the corresponding flag in the binary. 2) Change the discriminator decoding in `getBaseDiscriminator` and `getDuplicationFactor`. 3) set true for `FunctionSamples::ProfileIsFS` to enable FS functionality in ProfileData. Reviewed By: xur, hoy, wenlei Differential Revision: https://reviews.llvm.org/D113296
2021-11-15	[llvm-profgen] Add switch to allow use of first loadable segment for ↵	Wenlei He	1	-1/+5
	calculating offset Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By default first executable segment is used for calculating offset. The switch helps compatibility with unsymbolized profile generated from older tools. Differential Revision: https://reviews.llvm.org/D113727
2021-11-12	[llvm-profgen] Fix bug of setting function entry	wlei	1	-7/+38
	Previously we set `isFuncEntry` flag to true when the funcName from DWARF is equal to the name in symbol table and we use this flag to ignore reporting callsite sample that's from an intra func branch. However, in HHVM, it appears that the symbol table name is inconsistent with the dwarf info func name, it's likely due to `OptimizeGlobalAliases`. This change is a workaround in llvm-profgen side to mark the only one range as the function entry and add warnings for the remaining inconsistence. This also fixed a missing `getCanonicalFnName` for symbol name which caused the mismatching as well. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113492