aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
AgeCommit message (Collapse)AuthorFilesLines
5 hoursCleanup the LLVM exported symbols namespace (#161240)Nicolai Hähnle1-3/+8
There's a pattern throughout LLVM of cl::opts being exported. That in itself is probably a bit unfortunate, but what's especially bad about it is that a lot of those symbols are in the global namespace. Move them into the llvm namespace. While doing this, I noticed some other variables in the global namespace and moved them as well.
11 days[InstCombine][nfc] Fix assert failure with function entry count equal to zeroAlan Zhao1-12/+13
We were hitting an assert discovered in https://github.com/llvm/llvm-project/pull/157768#issuecomment-3315359832
2025-09-11[FunctionSpecialization] Fix profile count preserving logic (#157939)Alan Zhao1-2/+2
The previous fix in #157768 had a bug; instead of subtracting the original function's call count per call site of a specialization, we were subtracting the running total of the specialization's calls. Tracking issue: #147390
2025-09-10[FunctionSpecialization] Preserve call counts of specialized functions (#157768)Alan Zhao1-1/+28
A function that has been specialized will have its function entry counts preserved as follows: * Each specialization's count is the sum of each call site's basic block's number of entries as computed by `BlockFrequencyInfo`. * The original function's count will be decreased by the counts of its specializations. Tracking issue: #147390
2025-08-28[FuncSpec] Skip SCCP on blocks of dead functions and poison their callsites ↵XChy1-4/+17
(#154668) Fixes #153295. For test case below: ```llvm define i32 @caller() { entry: %call1 = call i32 @callee(i32 1) %call2 = call i32 @callee(i32 0) %cond = icmp eq i32 %call2, 0 br i1 %cond, label %common.ret, label %if.then common.ret: ; preds = %entry ret i32 0 if.then: ; preds = %entry %unreachable_call = call i32 @callee(i32 2) ret i32 %unreachable_call } define internal i32 @callee(i32 %ac) { entry: br label %ai ai: ; preds = %ai, %entry %add = or i32 0, 0 %cond = icmp eq i32 %ac, 1 br i1 %cond, label %aj, label %ai aj: ; preds = %ai ret i32 0 } ``` Before specialization, the SCCP solver determines that `unreachable_call` is unexecutable, as the value of `callee` can only be zero. After specializing the call sites `call1` and `call2`, FnSpecializer announces `callee` is a dead function since all executable call sites are specialized. However, the unexecutable call sites can become executable again after solving specialized calls. In this testcase, `call2` is considered `Overdefined` after specialization, making `cond` also `Overdefined`. Thus, `unreachable_call` becomes executable. This patch skips SCCP on the blocks in dead functions, and poisons the call sites of dead functions.
2025-08-17[llvm] Remove unused includes (NFC) (#154051)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-08-11[PredicateInfo] Use bitcast instead of ssa.copy (#151174)Nikita Popov1-13/+5
PredicateInfo needs some no-op to which the predicate can be attached. Currently this is an ssa.copy intrinsic. This PR replaces it with a no-op bitcast. Using a bitcast is more efficient because we don't have the overhead of an overloaded intrinsic. It also makes things slightly simpler overall.
2025-06-17[llvm] Lower latency bonus threshold in function specialization. (#143954)Slava Zakharin1-1/+1
Related to #143219. Function specialization does not kick in if flang sets `noalias` attributes on the function arguments of `digits_2`, because PRE optimizes several `srem` instructions and other memory accesses from the inner loops causing the latency bonus to be lower than the current 40% threshold. While looking at this, I did not really get why we compute the latency bonus as a ratio of the latency of the "eliminated" instructions and the code-size of the whole function. It did not make much sense to me. I tried computing the total latency as a sum of latencies of the instructions that belong to non-dead code (including the instructions that would be executed had they not been "eliminated" due to the constant propagation). This total latency should identify the total cost of executing the function with the given argument being dynamically equal to the tried constant value. Then the latency bonus would be computed as the ratio between the latency of the "eliminated" instructions and the total latency. Unfortunately, this did not given me a good heuristics either. The bonus was close to 0% on some targets, and as big as 3-5% on other targets. This does match very well with the performance gain achieved by function specialization for exchange2, so it seemd like another artificial heuristic not better than the current one. It seems that GCC uses a set of different heuristics for function specialization, but I am not an expert here and I cannot say if we can match them in LLVM. With all that said, I decided to try to lower the threshold to avoid the regression and be able to re-enable the generally good change for `noalias` attribute. With this patch, I was able to reduce the effect of `noalias`, so that `-force-no-alias=true` is only ~10% slower than `-force-no-alias=false` code on neoverse-v1 and neoverse-v2. On neoverse-n1, `-force-no-alias=true` is >2x faster than `-force-no-alias=false` regardless of this patch. This threshold has been changed before also due to improved alias information: https://github.com/llvm/llvm-project/commit/2fb51fba8ca904a6d3ddf30ae94228ecf9e6a231#diff-066363256b7b4164e66b28a3028b2cb9e405c9136241baa33db76ebd2edb87cd Please let me know what testing I should run to make sure this change is safe. As I understand, it may affect the compilation time performance, and I will appreciate it if someone points out which benchmarks need to be checked before merging this.
2025-04-23[CostModel] Remove optional from InstructionCost::getValue() (#135596)David Green1-2/+2
InstructionCost is already an optional value, containing an Invalid state that can be checked with isValid(). There is little point in returning another optional from getValue(). Most uses do not make use of it being a std::optional, dereferencing the value directly (either isValid has been checked previously or the Cost is assumed to be valid). The one case that does in AMDGPU used value_or which has been replaced by a isValid() check.
2025-01-25[IPSCCP][FuncSpec] Protect against metadata access from call args. (#124284)David Green1-0/+2
Fixes an issue reported from #114964, where metadata arguments were attempted to be accessed as constants.
2025-01-08[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955)Ryan Mansfield1-7/+7
2024-11-06[FuncSpec] Query SCCPSolver in more places (#114964)Hari Limaye1-21/+21
When traversing the use-def chain of an Argument in a candidate specialization, also query the SCCPSolver to see if a Value is constant. This allows us to better estimate the codesize savings of a candidate in the presence of instructions that are a user of the argument we are estimating savings for which also use arguments that have been found constant by IPSCCP. Similarly when estimating the dead basic blocks from branch and switch instructions which become constant, also query the SCCPSolver to see if a predecessor is unreachable.
2024-11-04[FuncSpec] Improve handling of BinaryOperator instructions (#114534)Hari Limaye1-8/+9
When visiting BinaryOperator instructions during estimation of codesize savings for a candidate specialization, don't bail when the other operand is not found to be constant. This allows us to find more constants than we otherwise would, for example `and(false, x)`.
2024-11-04[FuncSpec] Improve handling of Comparison Instructions (#114073)Hari Limaye1-8/+16
When visiting comparison instructions during computation of a specializations's bonus, make use of information from the lattice value of the other operand in the case where we have not found this to have a specific constant value.
2024-11-04[FuncSpec] Handle ssa_copy intrinsic calls in InstCostVisitor (#114247)Hari Limaye1-4/+8
Look through ssa_copy intrinsic calls when computing codesize bonus for a specialization. Also remove redundant logic to skip computing codesize bonus for ssa_copy intrinsics, now these are considered zero-cost by TTI (in PR #75294).
2024-10-29[FuncSpec] Improve accounting of specialization codesize growth (#113448)Hari Limaye1-19/+22
Only accumulate the codesize increase of functions that are actually specialized, rather than for every candidate specialization that we analyse. This fixes a subtle bug where prior analysis of candidate specializations that were deemed unprofitable could prevent subsequent profitable candidates from being recognised.
2024-10-29[FuncSpec] Enable SpecializeLiteralConstant by default (#113442)Hari Limaye1-11/+9
Enable specialization on literal constant arguments by default in Function Specialization. --------- Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
2024-10-28Check hasOptSize() in shouldOptimizeForSize() (#112626)Ellis Hoag1-2/+1
2024-10-23[FuncSpec] Only compute Latency bonus when necessary (#113159)Hari Limaye1-43/+102
Only compute the Latency component of a specialisation's Bonus when necessary, to avoid unnecessarily computing the Block Frequency Information for a Function.
2024-10-18[FuncSpec] Update MinFunctionSize logic (#112711)Hari Limaye1-5/+10
Always require functions to be larger than MinFunctionSize when SpecializeLiteralConstant is enabled, and increase MinFunctionSize to 500, to prevent excessive triggering of specialisations on small functions.
2024-10-09[FuncSpec] Improve estimation of select instruction. (#111176)Alexandros Lamprineas1-7/+10
When propagating a constant to a select instruction we only consider the condition operand as the use. I am extending the logic to consider the true and false values too, in case the condition had been found to be constant in a previous propagation but halted.
2024-08-13[LLVM] Don't peek through bitcast on pointers and gep with zero indices. ↵Yingwei Zheng1-5/+0
NFC. (#102889) Since we are using opaque pointers now, we don't need to peek through bitcast on pointers and gep with zero indices.
2024-06-12FunctionSpecialization: Make the ordering of BestSpecs stricterHans Wennborg1-1/+3
otherwise it's not guaranteed which of two candidates with the same score would get specialized first, or at all.
2023-11-22[FuncSpec] Update function specialization to handle phi-chains (#72903)Mats Petersson1-14/+94
When using the LLVM flang compiler with alias analysis (AA) enabled, SPEC2017:548.exchange2_r was running significantly slower than wihtout the AA. This was caused by the GVN pass replacing many of the loads in the pre-AA code with phi-nodes that form a long chain of dependencies, which the function specialization was unable to follow. This adds a function to discover phi-nodes in a transitive set, with some limitations to avoid spending ages analysing phi-nodes. The minimum latency savings also had to be lowered - fewer load instructions means less saving. Adding some more prints to help debugging the isProfitable decision. No significant change in compile time or generated code-size. (A previous attempt to fix this was abandoned: https://github.com/llvm/llvm-project/pull/71442) --------- Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
2023-11-06[IPO] Remove unnecessary bitcasts (NFC)Nikita Popov1-3/+0
2023-10-05Use BlockFrequency type in more places (NFC) (#68266)Matthias Braun1-1/+1
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it more consistently in various APIs and disable implicit conversion to make usage more consistent and explicit. - Use `BlockFrequency Freq` parameter for `setBlockFreq`, `getProfileCountFromFreq` and `setBlockFreqAndScale` functions. - Return `BlockFrequency` in `getEntryFreq()` functions. - While on it change some `const BlockFrequency& Freq` parameters to plain `BlockFreqency Freq`. - Mark `BlockFrequency(uint64_t)` constructor as explicit. - Add missing `BlockFrequency::operator!=`. - Remove `uint64_t BlockFreqency::getMaxFrequency()`. - Add `BlockFrequency BlockFrequency::max()` function.
2023-09-19[FuncSpec] Adjust the names of specializations and promoted stack valuesAlexandros Lamprineas1-3/+4
Currently the naming scheme is a bit funky; the specializations are named after the original function followed by an arbitrary decimal number. This makes it hard to debug inlined specializations of recursive functions. With this patch I am adding ".specialized." in between of the original name and the suffix, which is now a single increment counter.
2023-08-26[NFC][FuncSpec] Update the description of function specialization.Alexandros Lamprineas1-39/+0
The code has changed significantly over time making the description outdated. In this patch I am re-writing the description with an emphasis to the cost model, where most of the changes have happened. Differential Revision: https://reviews.llvm.org/D158723
2023-08-22[FuncSpec] Increase the maximum number of times the specializer can run.Alexandros Lamprineas1-2/+12
* Changes the default value of FuncSpecMaxIters from 1 to 10. This allows specialization of recursive functions. * Adds an option to control the maximum codesize growth per function. * Measured ~45% performance uplift for SPEC2017:548.exchange2_r on AWS Graviton3. Differential Revision: https://reviews.llvm.org/D145819
2023-08-10[llvm] Use DenseMap::lookup (NFC)Kazu Hirata1-3/+1
2023-08-09[FuncSpec] Rework the discardment logic for unprofitable specializations.Alexandros Lamprineas1-35/+70
Currently we make an arbitrary comparison between codesize and latency in order to decide whether to keep a specialization or not. Sometimes the latency savings are biased in favor of loops because of imprecise block frequencies, therefore this metric contains a lot of noise. This patch tries to address the problem as follows: * Reject specializations whose codesize savings are less than X% of the original function size. * Reject specializations whose latency savings are less than Y% of the original function size. * Reject specializations whose inlining bonus is less than Z% of the original function size. I am not saying this is super precise, but at least X, Y and Z are configurable, allowing us to tweak the cost model. Moreover, it lets us prioritize codesize over latency, which is a less noisy metric. I am also increasing the minimum size a function should have to be considered a candidate for specialization. Initially the cost of a function was calculated as CodeMetrics::NumInsts * InlineConstants::getInstrCost() which later in D150464 was altered into CodeMetrics::NumInsts since the metric is supposed to model TargetTransformInfo::TCK_CodeSize. However, we omitted adjusting MinFunctionSize in that commit. Differential Revision: https://reviews.llvm.org/D157123
2023-08-07[FuncSpec] Estimate dead blocks more accurately.Alexandros Lamprineas1-17/+33
Currently we only consider basic blocks with a unique predecessor when estimating the size of dead code. However, we could expand to this to consider blocks with a back-edge, or blocks preceded by dead blocks. Differential Revision: https://reviews.llvm.org/D156903
2023-08-02Reland [FuncSpec] Split the specialization bonus into CodeSize and Latency.Alexandros Lamprineas1-69/+68
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency when estimating the specialization bonus. This is suboptimal, and in some cases erroneous. For example we shouldn't be weighting the codesize decrease attributed to constant propagation by the block frequency of the dead code. Instead only the latency savings should be weighted by block frequency. The total codesize savings from all the specialization arguments should be deducted from the specialization cost. Differential Revision: https://reviews.llvm.org/D155103
2023-07-31Reland [FuncSpec] Add Phi nodes to the InstCostVisitor.Alexandros Lamprineas1-7/+83
This patch allows constant folding of PHIs when estimating the user bonus. Phi nodes are a special case since some of their inputs may remain unresolved until all the specialization arguments have been processed by the InstCostVisitor. Therefore, we keep a list of dead basic blocks and then lazily visit the Phi nodes once the user bonus has been computed for all the specialization arguments. Differential Revision: https://reviews.llvm.org/D154852
2023-07-27Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor."Douglas Yung1-83/+7
This reverts commit 96ff464dd3aac255adc52787a1e28487a9cd4c35. The test in this change was failing on many buildbots: https://lab.llvm.org/buildbot/#/builders/164/builds/41292 https://lab.llvm.org/buildbot/#/builders/258/builds/4491 https://lab.llvm.org/buildbot/#/builders/192/builds/3566 https://lab.llvm.org/buildbot/#/builders/123/builds/20411 https://lab.llvm.org/buildbot/#/builders/58/builds/42553 https://lab.llvm.org/buildbot/#/builders/247/builds/7037 https://lab.llvm.org/buildbot/#/builders/139/builds/46259 https://lab.llvm.org/buildbot/#/builders/216/builds/24650 https://lab.llvm.org/buildbot/#/builders/234/builds/12571 https://lab.llvm.org/buildbot/#/builders/232/builds/12574 https://lab.llvm.org/buildbot/#/builders/235/builds/975
2023-07-27[FuncSpec] Add Phi nodes to the InstCostVisitor.Alexandros Lamprineas1-7/+83
This patch allows constant folding of PHIs when estimating the user bonus. Phi nodes are a special case since some of their inputs may remain unresolved until all the specialization arguments have been processed by the InstCostVisitor. Therefore, we keep a list of dead basic blocks and then lazily visit the Phi nodes once the user bonus has been computed for all the specialization arguments. In addition to the last revision this one fixes the bug reported on Phabricator. Differential Revision: https://reviews.llvm.org/D154852
2023-07-26Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor."Alexandros Lamprineas1-129/+60
Reverting due to the crash reported in D154852. Also reverting the subsequent commit as collateral damage: "[FuncSpec] Split the specialization bonus into CodeSize and Latency."
2023-07-26[FuncSpec] Split the specialization bonus into CodeSize and Latency.Alexandros Lamprineas1-68/+67
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency when estimating the specialization bonus. This is suboptimal, and in some cases erroneous. For example we shouldn't be weighting the codesize decrease attributed to constant propagation by the block frequency of the dead code. Instead only the latency savings should be weighted by block frequency. The total codesize savings from all the specialization arguments should be deducted from the specialization cost. Differential Revision: https://reviews.llvm.org/D155103
2023-07-25[FuncSpec][NFC] Leave a comment for future improvements.Alexandros Lamprineas1-0/+3
Adds a TODO for checking inlinining opportunities while traversing the users of the specialization arguments. This was brought up in the review of D154852.
2023-07-25[FuncSpec] Add Phi nodes to the InstCostVisitor.Alexandros Lamprineas1-6/+76
This patch allows constant folding of PHIs when estimating the user bonus. Phi nodes are a special case since some of their inputs may remain unresolved until all the specialization arguments have been processed by the InstCostVisitor. Therefore, we keep a list of dead basic blocks and then lazily visit the Phi nodes once the user bonus has been computed for all the specialization arguments. Differential Revision: https://reviews.llvm.org/D154852
2023-07-14[FuncSpec][NFC] Sink cast into function.Alexandros Lamprineas1-17/+7
Before looking up a value in the map of known constants we attempt to dynamically cast it. The code looks cleaner if we move the cast inside findConstantFor(), where the look up happens. Differential Revision: https://reviews.llvm.org/D155177
2023-07-11[FuncSpec] Prefer DataLayout-aware constant folding of GEPs.Alexandros Lamprineas1-6/+3
As shown in D154820, the DataLayout-independent constant folding interface is not good enough for handling GEPs. Instead we should be using the DataLayout-aware constant folding interface. Since there isn't a method to specifically handle GEPs we can use the one which folds generic instruction operands. Differential Revision: https://reviews.llvm.org/D154821
2023-06-30[FuncSpec] Avoid crashing when SwitchInst doesn't see ConstantIntVincent Lee1-1/+4
D150464 updated the cost model for function specialization. Unfortunately, this also crashes when trying to build stage2 LLD with thinLTO and assertions. It looks like the issue is caused by a mishandling of the Constant in a SwitchInst since the Constant cannot always be assumed to safely casted to a ConstantInt. In the case of the crash, Constant was a ConstantExpr which triggered the assertion. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D154159
2023-06-19[FuncSpec] Promote stack values before specialization.Alexandros Lamprineas1-74/+54
After each iteration of the function specializer, constant stack values are promoted to constant globals in order to enable recursive function specialization. This should also be done once before running the specializer. Enables specialization of _QMbrute_forcePdigits_2 from SPEC2017:548.exchange2_r. Differential Revision: https://reviews.llvm.org/D152799
2023-06-19[FuncSpec] Add Freeze and CallBase to the InstCostVisitor.Alexandros Lamprineas1-0/+28
Allows constant folding of such instructions when estimating user bonus. Differential Revision: https://reviews.llvm.org/D153036
2023-06-08Reland "[FuncSpec] Improve the accuracy of the cost model"Alexandros Lamprineas1-41/+219
Instead of blindly traversing the use-def chain of constant arguments, compute known constants along the way. Stop as soon as a user cannot be replaced by a constant. Keep it light-weight by handling some basic instruction types. Differential Revision: https://reviews.llvm.org/D150464
2023-06-08Reland "[FuncSpec] Replace LoopInfo with BlockFrequencyInfo"Alexandros Lamprineas1-19/+14
Using AvgLoopIters on any loop is too imprecise making the cost model favor users inside loop nests regardless of the actual tripcount. Differential Revision: https://reviews.llvm.org/D150375
2023-05-30Revert "[FuncSpec] Replace LoopInfo with BlockFrequencyInfo"Nikita Popov1-260/+52
As reported on https://reviews.llvm.org/D150375#4367861 and following, this change causes PDT invalidation issues. Revert it and dependent commits. This reverts commit 0524534d5220da5ecb2cd424a46520184d2be366. This reverts commit ced90d1ff64a89a13479a37a3b17a411a3259f9f. This reverts commit 9f992cc9350a7f7072a6dbf018ea07142ea7a7ed. This reverts commit 1b1232047e83b69561fd64b9547cb0a0d374473a.
2023-05-25[FuncSpec] Enable specialization of literal constants.Alexandros Lamprineas1-9/+44
To do so we have to tweak the cost model such that specialization does not trigger excessively. Differential Revision: https://reviews.llvm.org/D150649
2023-05-24[FuncSpec] Improve the accuracy of the cost model.Alexandros Lamprineas1-41/+219
Instead of blindly traversing the use-def chain of constant arguments, compute known constants along the way. Stop as soon as a user cannot be replaced by a constant. Keep it light-weight by handling some basic instruction types. Differential Revision: https://reviews.llvm.org/D150464