rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-07-31	Revert "[PGO] Add `llvm.loop.estimated_trip_count` metadata" (#151585)	Joel E. Denny	1	-6/+2
	Reverts llvm/llvm-project#148758 [As requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
2025-07-31	[PGO] Add `llvm.loop.estimated_trip_count` metadata (#148758)	Joel E. Denny	1	-2/+6
	This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR #128785 review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036), it does so via a new `PGOEstimateTripCountsPass` pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's current `branch_weights` metadata.
2025-07-04	[Passes] Move LoopInterchange into optimization pipeline (#145503)	Ryotaro Kasuga	1	-3/+4
	As mentioned in https://github.com/llvm/llvm-project/pull/145071, LoopInterchange should be part of the optimization pipeline rather than the simplification pipeline. This patch moves LoopInterchange into the optimization pipeline. More contexts: - By default, LoopInterchange attempts to improve data locality, however, it also takes increasing vectorization opportunities into account. Given that, it is reasonable to run it as close to vectorization as possible. - I looked into previous changes related to the placement of LoopInterchange, but couldn’t find any strong motivation suggesting that it benefits other simplifications. - As far as I tried some tests (including llvm-test-suite), removing LoopInterchange from the simplification pipeline does not affect other simplifications. Therefore, there doesn't seem to be much value in keeping it there. - The new position reduces compile-time for ThinLTO, probably because it only runs once per function in post-link optimization, rather than both in pre-link and post-link optimization. I haven't encountered any cases where the positional difference affects optimization results, so please feel free to revert if you run into any issues.
2025-06-23	[TRE] Adjust function entry count when using instrumented profiles (#143987)	Mircea Trofin	1	-3/+11
	The entry count of a function needs to be updated after a callsite is elided by TRE: before elision, the entry count accounted for the recursive call at that callsite. After TRE, we need to remove that callsite's contribution. This patch enables this for instrumented profiling cases because, there, we know the function entry count captured entries before TRE. We cannot currently address this for sample-based (because we don't know whether this function was TRE-ed in the binary that donated samples)
2025-06-23	[Passes] Remove LoopInterchange from O1 pipeline (#145071)	Nikita Popov	1	-3/+0
	This is a fairly exotic pass, I don't think it makes a lot of sense to run it at O1, esp. as vectorization wouldn't run at O1 anyway.
2025-06-05	Add SimplifyTypeTests pass.	Peter Collingbourne	1	-0/+3
	This pass figures out whether inlining has exposed a constant address to a lowered type test, and remove the test if so and the address is known to pass the test. Unfortunately this pass ends up needing to reverse engineer what LowerTypeTests did; this is currently inherent to the design of ThinLTO importing where LowerTypeTests needs to run at the start. Reviewers: teresajohnson Reviewed By: teresajohnson Pull Request: https://github.com/llvm/llvm-project/pull/141327
2025-06-05	[MemProf] Split MemProfiler into Instrumentation and Use. (#142811)	Snehasish Kumar	1	-1/+2
	Most of the recent development on the MemProfiler has been on the Use part. The instrumentation has been quite stable for a while. As the complexity of the use grows (with undrifting, diagnostics etc) I figured it would be good to separate these two implementations.
2025-06-04	[llvm] Remove unused includes (NFC) (#142733)	Kazu Hirata	1	-2/+0
	These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-14	[LV][EVL] Introduce the EVLIndVarSimplify Pass for EVL-vectorized loops ↵	Min-Yih Hsu	1	-0/+1
	(#131005) When we enable EVL-based loop vectorization w/ predicated tail-folding, each vectorized loop has effectively two induction variables: one calculates the step using (VF x vscale) and the other one increases the IV by values returned from experiment.get.vector.length. The former, also known as canonical IV, is more favorable for analyses as it's "countable" in the sense of SCEV; the latter (EVL-based IV), however, is more favorable to codegen, at least for those that support scalable vectors like AArch64 SVE and RISC-V. The idea is that we use canonical IV all the way until the end of all vectorizers, where we replace it with EVL-based IV using EVLIVSimplify introduced here. Such that we can have the best from both worlds. This Pass is enabled by default in RISC-V. However, since we haven't really vectorize loops with predicate tail-folding by default, this Pass is no-op at this moment.
2025-05-07	[AA] Move Target Specific AA before BasicAA (#125965)	Chengjun	1	-0/+4
	In this change, NVPTX AA is moved before Basic AA to potentially improve compile time. Additionally, it introduces a flag in the `ExternalAAWrapper` that allows other backends to run their target-specific AA passes before Basic AA, if desired. The change works for both New Pass Manager and Legacy Pass Manager. Original implementation by Princeton Ferro <pferro@nvidia.com>
2025-04-14	[LTO][Pipelines] Add 0 hot-caller threshold for SamplePGO + FullLTO (#135152)	Tianle Liu	1	-8/+8
	If a hot callsite function is not inlined in the 1st build, inlining the hot callsite in pre-link stage of SPGO 2nd build may lead to Function Sample not found in profile file in link stage. It will miss some profile info. ThinLTO has already considered and dealed with it by setting HotCallSiteThreshold to 0 to stop the inline. This patch just adds the same processing for FullLTO.
2025-04-08	[ctxprof] Use the flattened contextual profile pre-thinlink (#134723)	Mircea Trofin	1	-2/+4
	Flatten the profile pre-thinlink so that ThinLTO has something to work with for the parts of the binary that aren't covered by contextual profiles. Post-thinlink, the flattener is re-run and will actually change profile info, but just for the modules containing contextual trees ("specialized modules"). For the rest, the flattener just yanks out the instrumentation.
2025-04-07	[fatlto] Add coroutine passes when using FatLTO with ThinLTO (#134434)	Paul Kirth	1	-0/+13
	When coroutines are used w/ both -ffat-lto-objects and -flto=thin, the coroutine passes are not added to the optimization pipelines. Ensure they are added before ModuleOptimization to generate a working ELF object. Fixes #134409.
2025-03-13	[InstrProf] Remove -forder-file-instrumentation (#130192)	Ellis Hoag	1	-8/+0

2025-03-06	Revert "[LTO][Pipelines][Coro] De-duplicate Coro passes" (#129977)	Vitaly Buka	1	-14/+14
	Reverts llvm/llvm-project#128654 Breaks FatLTO https://github.com/llvm/llvm-project/pull/128654#issuecomment-2700053700
2025-02-26	[ctxprof] Override type of instrumentation if `-profile-context-root` is ↵	Mircea Trofin	1	-4/+4
	specified (#128940) This patch makes it easy to enable ctxprof instrumentation for targets where the build has a bunch of defaults for instrumented PGO that we want to inherit for ctxprof. This is switching experimental defaults: we'll eventually enable ctxprof instrumentation through `PGOOpt` but that type is currently quite entangled and, for the time being, no point adding to that.
2025-02-26	[ctxprof] don't inline weak symbols after instrumentation (#128811)	Mircea Trofin	1	-0/+7
	Contextual profiling identifies functions by GUID. Functions that may get overridden by the linker with a prevailing copy may have, during instrumentation, different variants in different modules. If these variants get inlined before linking (here I assume thinlto), they will identify themselves to the ctxprof runtime as their GUID, leading to issues - they may have different counter counts, for instance. If we block their inlining in the pre-thinlink compilation, only the prevailing copy will survive post-thinlink and the confusion is avoided. The change introduces a small pass just for this purpose, which marks any symbols that could be affected by the above as `noinline` (even if they were `alwaysinline`). We already carried out some inlining (via the preinliner), before instrumenting, so technically the `alwaysinline` directives were honored. We could later (different patch) choose to mark them back to their original attribute (none or `alwaysinline`) post-thinlink, if we want to - but experimentally that doesn't really change much of the performance of the instrumented binary.
2025-02-25	[LTO][Pipelines][Coro] De-duplicate Coro passes (#128654)	Vitaly Buka	1	-14/+14
	``` if (!isLTOPostLink(Phase)) CoroPM.addPass(CoroEarlyPass()); if (!isLTOPreLink(Phase)) // Other Coro passes ``` Followup to #126168.
2025-02-25	[LTO][Pipelines][NFC] Exctract isLTOPostLink (#128653)	Vitaly Buka	1	-2/+7

2025-02-13	[llvm][fatlto] Add FatLTOCleanup pass (#125911)	Paul Kirth	1	-0/+7
	When using FatLTO, it is common to want to enable certain types of whole program optimizations (WPD) or security transforms (CFI), so that they can be made available when performing LTO. However, these transforms should not be used when compiling the non-LTO object code. Since the frontend must emit different IR, we cannot simply clone the module and optimize the LTO section and non-LTO section differently to work around this. Instead, we need to remove any problematic instruction sequences. This patch adds a new pass whose responsibility is to clean up the IR in the FatLTO pipeline after creating the bitcode section, which is after running the pre-link pipeline but before running module optimization. This allows us to safely drop any conflicting instructions or IR constructs that are inappropriate for non-LTO compilation.
2025-02-12	[LTO][Pipelines][Coro] Handle coroutines in LTO pipeline (#126168)	Vitaly Buka	1	-1/+14
	ThinLTO delays handling of coroutines to ThinLTO backend. However it's usually possible to use ThinLTO prelink objects for FullLTO. In this case we have left-over coroutines which crash in codegen. Issue #104525.
2025-02-11	[NFC][Pipelines] Extract buildCoroConditionalWrapper (#126860)	Vitaly Buka	1	-8/+14
	Helper for #126168. `Phase` will be used in followup patches.
2025-02-07	[Clang][Driver] Add an option to control loop-interchange (#125830)	Sjoerd Meijer	1	-5/+6
	This introduces options `-floop-interchange` and `-fno-loop-interchange` to enable/disable the loop-interchange pass. This is part of the work that tries to get that pass enabled by default (#124911), where it was remarked that a user facing option to control this would be convenient to have. The option name is the same as GCC's.
2025-02-06	[OpenMP] Fix the OpenMPOpt pass incorrectly optimizing if definition was missing	Joseph Huber	1	-2/+2
	Summary: This code is intended to block transformations if the call isn't present, however the way it's coded it silently lets it pass if the definition doesn't exist at all. This previously was always valid since we included the runtime as one giant blob so everything was always there, but now that we want to move towards separate ones, it's not quite correct.
2025-01-29	[PassBuilder] VectorizerEnd Extension Points (#123494)	Axel Sorenson	1	-0/+18
	Added an extension point after vectorizer passes in the PassBuilder. Additionally, added extension points before and after vectorizer passes in `buildLTODefaultPipeline`. Credit goes to @mshockwave for guiding me through my first LLVM contribution (and my first open source contribution in general!) :) - Implemented `registerVectorizerEndEPCallback` - Implemented `invokeVectorizerEndEPCallbacks` - Added `VectorizerEndEPCallbacks` SmallVector - Added a command line option `passes-ep-vectorizer-end` to `NewPMDriver.cpp` - `buildModuleOptimizationPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildO0DefaultPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildLTODefaultPipeline` now calls BOTH `invokeVectorizerStartEPCallbacks` and `invokeVectorizerEndEPCallbacks` - Added LIT tests to `new-pm-defaults.ll`, `new-pm-lto-defaults.ll`, `new-pm-O0-ep-callbacks.ll`, and `pass-pipeline-parsing.ll` - Renamed `CHECK-EP-Peephole` to `CHECK-EP-PEEPHOLE` in `new-pm-lto-defaults.ll` for consistency. This code is intended for developers that wish to implement and run custom passes after the vectorizer passes in the PassBuilder pipeline. For example, in #91796, a pass was created that changed the induction variables of vectorized code. This is right after the vectorization passes.
2025-01-28	[PassBuilder] Add RelLookupTableConverterPass to LTO (#124053)	gulfemsavrun	1	-3/+3
	[PassBuilder] Add RelLookupTableConverterPass to LTO This patch adds RelLookupTableConverterPass into the LTO post-link optimization pass pipeline. This optimization converts lookup tables to relative lookup tables to make them PIC-friendly, which is already included in the non-LTO pass pipeline. This patch adds this optimization to the post-link optimization pipeline to discover more opportunities in the LTO context.
2025-01-08	[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955)	Ryan Mansfield	1	-4/+4

2024-12-04	[Passes] Generalize ShouldRunExtraVectorPasses to allow re-use (NFCI). (#118323)	Florian Hahn	1	-2/+3
	Generalize ShouldRunExtraVectorPasses to ShouldRunExtraPasses, to allow re-use for other transformations. PR: https://github.com/llvm/llvm-project/pull/118323
2024-11-13	[CGData] Global Merge Functions (#112671)	Kyungwoo Lee	1	-0/+1
	This implements a global function merging pass. Unlike traditional function merging passes that use IR comparators, this pass employs a structurally stable hash to identify similar functions while ignoring certain constant operands. These ignored constants are tracked and encoded into a stable function summary. When merging, instead of explicitly folding similar functions and their call sites, we form a merging instance by supplying different parameters via thunks. The actual size reduction occurs when identically created merging instances are folded by the linker. Currently, this pass is wired to a pre-codegen pass, enabled by the `-enable-global-merge-func` flag. In a local merging mode, the analysis and merging steps occur sequentially within a module: - `analyze`: Collects stable function hashes and tracks locations of ignored constant operands. - `finalize`: Identifies merge candidates with matching hashes and computes the set of parameters that point to different constants. - `merge`: Uses the stable function map to optimistically create a merged function. We can enable a global merging mode similar to the global function outliner (https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/), which will perform the above steps separately. - `-codegen-data-generate`: During the first round of code generation, we analyze local merging instances and publish their summaries. - Offline using `llvm-cgdata` or at link-time, we can finalize all these merging summaries that are combined to determine parameters. - `-codegen-data-use`: During the second round of code generation, we optimistically create merging instances within each module, and finally, the linker folds identically created merging instances. Depends on #112664 This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-11-08	[SampleFDO] Support enabling sample loader pass in O0 mode (#113985)	Lei Wang	1	-0/+13
	Add support for enabling sample loader pass in O0 mode(under `-fsample-profile-use`). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy. - Explicitly disable the sample loader inlining to ensure it only emits sampling annotation. - Use flattened profile for O0 mode. - Add the pass after `AddDiscriminatorsPass` pass to work with `-fdebug-info-for-profiling`.
2024-11-07	[Coroutines] Inline the `.noalloc` ramp function marked coro_safe_elide ↵	Yuxuan Chen	1	-3/+3
	(#114004)
2024-11-06	Reland "[LTO] Run Argument Promotion before IPSCCP" (#111853)	Hari Limaye	1	-4/+9
	Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas. Relands #111163.
2024-11-03	[PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (#114577)	Shilei Tian	1	-10/+14

2024-11-03	[PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs ↵	Shilei Tian	1	-4/+4
	(#114547) The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link stage. There are some passes that we want them in non-LTO mode, but not at LTO pre-link stage. The control is missing currently. This PR adds the support. To demonstrate the use, we only enable the internalization pass in non-LTO mode for AMDGPU because having it run in pre-link stage causes some issues.
2024-11-01	[PassBuilder] Replace `bool LTOPreLink` with `ThinOrFullLTOPhase Phase` ↵	Shilei Tian	1	-13/+11
	(#114564) This will allow more fine-grained control in the future.
2024-10-31	[InstrPGO] Avoid using global variable to fix potential data race (#114364)	Lei Wang	1	-1/+21
	In https://github.com/llvm/llvm-project/pull/109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).
2024-10-31	[llvm][fatlto] Drop any CFI related instrumentation after emitting bitcode ↵	Paul Kirth	1	-0/+7
	(#112788) We want to support CFI instrumentation for the bitcode section, without miscompiling the object code portion of a FatLTO object. We can reuse the existing mechanisms in the LowerTypeTestsPass to do that, by just adding the pass to the FatLTO pipeline after the EmbedBitcodePass with the correct options set. Fixes #112053
2024-10-31	Revert "[InstrPGO] Support cold function coverage instrumentation (#109837)"	Dmitry Chernenkov	1	-16/+1
	This reverts commit e517cfc531886bf6ed64b4e7109bb3141ac7f430.
2024-10-30	[llvm] Allow always dropping all llvm.type.test sequences	Paul Kirth	1	-5/+10
	Currently, the `DropTypeTests` parameter only fully works with phi nodes and llvm.assume instructions. However, we'd like CFI to work in conjunction with FatLTO, in so far as the bitcode section should be able to contain the CFI instrumentation, while any incompatible bits are dropped when compiling the object code. To do that, we need to drop the llvm.type.test instructions everywhere, and not just their uses in phi nodes. This patch updates the LowerTypeTest pass so that uses are removed, and replaced with `true` in all cases, and not just in phi nodes. Addressing this will allow us to fix #112053 by modifying the FatLTO pipeline. Reviewers: pcc, nikic Reviewed By: pcc Pull Request: https://github.com/llvm/llvm-project/pull/112787
2024-10-28	[InstrPGO] Support cold function coverage instrumentation (#109837)	Lei Wang	1	-1/+16
	This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags: - Added a flag `--pgo-instrument-cold-function-only` in `PGOInstrumentation.cpp` as the main switch to control skipping the instrumentation. - Built the extra instrumentation passes(a bundle of passes in `addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by `--instrument-cold-function-only-path` flag. - Added a driver flag `-fprofile-generate-cold-function-coverage`: - 1) Config the flags in one place, i,e. adding `--instrument-cold-function-only-path=<...>` and `--pgo-function-entry-coverage`. Note that the instrumentation file path is passed through `--instrument-sample-cold-function-path`, because we cannot use the `PGOOptions.ProfileFile` as it's already used by `-fprofile-sample-use=<...>`. - 2) makes linker to link `compiler_rt.profile` lib(see [ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131) ). - Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry count to determine cold function. Overall, the full command is like: ``` clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code ```
2024-10-11	[MemProf] Support cloning for indirect calls with ThinLTO (#110625)	Teresa Johnson	1	-2/+5
	This patch enables support for cloning in indirect callsites. This is done by synthesizing callsite records for each virtual call target from the profile metadata. In the thin link all the synthesized records for a particular indirect callsite initially share the same context node, but support is added to partition the callsites and outgoing edges based on the callee function, creating a separate node for each target. In the LTO backend, when cloning is needed we first perform indirect call promotion, then change the target of the new direct call to the desired clone. Note this is ThinLTO-specific, since for regular LTO indirect call promotion should have already occurred.
2024-10-10	[Passes] Remove -enable-infer-alignment-pass flag (#111873)	Arthur Eubanks	1	-6/+2
	This flag has been on for a while without any complaints.
2024-10-10	Revert "[LTO] Run Argument Promotion before IPSCCP" (#111839)	Hari Limaye	1	-9/+0
	Reverts llvm/llvm-project#111163, as this was merged prematurely.
2024-10-10	[LTO] Run Argument Promotion before IPSCCP (#111163)	Hari Limaye	1	-0/+9
	Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas.
2024-09-11	[ctx_prof] Relax the "profile use" case around `PGOOpt` (#108265)	Mircea Trofin	1	-3/+3
	`PGOOpt` could have a value if, for instance, debug info for profiling is requested. Relaxing the requirement, for now, following that eventually we would factor `PGOOpt` to better capture the supported interplay between the various profiling options.
2024-09-09	[LLVM][Coroutines] Switch CoroAnnotationElidePass to a FunctionPass (#107897)	Yuxuan Chen	1	-3/+3
	After landing https://github.com/llvm/llvm-project/pull/99285 we found that the call graph update was causing the following crash when expensive checks are turned on ``` llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC \|\| RC->isAncestorOf(Targe tRC)) && "New call edge is not trivial!"' failed. ``` I have to admit I believe that the call graph update process I did for that patch could be wrong. After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced that `CoroAnnotationElidePass` can be a FunctionPass and rely on the adaptor to update the call graph for us, so long as we properly invalidate the caller's analyses. After this patch, `llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer fails under expensive checks.
2024-09-09	[ctx_prof] Insert the ctx prof flattener after the module inliner (#107499)	Mircea Trofin	1	-5/+14
	This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore. Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)
2024-09-08	[LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI ↵	Yuxuan Chen	1	-2/+8
	coroutines to the `noalloc` variant (#99285) This patch is episode three of the middle end implementation for the coroutine HALO improvement project published on discourse: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 After we attribute the calls to some coroutines as "coro_elide_safe" in the C++ FE and creating a `noalloc` ramp function, we use a new middle end pass to move the call to coroutines to the noalloc variant. This pass should be run after CoroSplit. For each node we process in CoroSplit, we look for its callers and replace the attributed ones in presplit coroutines to the noalloc one. The transformed `noalloc` ramp function will also require a frame pointer to a block of memory it can use as an activation frame. We allocate this on the caller's frame with an alloca. Please note that we cannot safely transform such attributed calls in post-split coroutines due to memory lifetime reasons. The CoroSplit pass is responsible for creating the coroutine frame spills for all the allocas in the coroutine. Therefore it will be unsafe to create new allocas like this one in post-split coroutines. This happens relatively rarely because CGSCC performs the passes on the callees before the caller. However, if multiple coroutines coexist in one SCC, this situation does happen (and prevents us from having potentially unbound frame size due to recursion.) You can find episode 1: Clang FE of this patch series at https://github.com/llvm/llvm-project/pull/99282 Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283
2024-09-06	[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding ↵	Mingming Liu	1	-10/+0
	synthetic count passes. (#107471) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of https://github.com/llvm/llvm-project/commit/64498c54831bed9cf069e0923b9b73678c6451d8). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version
2024-09-06	[ctx_prof] Flattened profile lowering pass (#107329)	Mircea Trofin	1	-0/+1
	Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened. Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs #105469, #106154). Flattening (#104539) sums up all the counters belonging to all a function's context nodes. We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point. Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)