aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Analysis/InlineCost.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-08-18[Inliner] Add option (default off) to inline all calls regardless of the ↵Justin Fargnoli1-0/+8
cost (#152365) Add a default off option to the inline cost calculation to always inline all viable calls regardless of the cost/benefit and cost/threshold calculations. For performance reasons, some users require that all calls be inlined. Rather than forcing them to adjust the inlining threshold to an arbitrarily high value, offer an option to inline all calls.
2025-07-09Account for inline assembly instructions in inlining cost. (#146628)Rahman Lavaee1-0/+55
Inliner currently treats every "call asm" IR instruction as a single instruction regardless of how many instructions the inline assembly may contain. This may underestimate the cost of inlining for a callee containing long inline assembly. Besides, we may need to assign a higher cost to instructions in inline assembly since they cannot be analyzed and optimized by the compiler. This PR introduces a new option `-inline-asm-instr-cost` -- set zero by default, which can control the cost of inline assembly instructions in inliner's cost-benefit analysis.
2025-06-24[InlineCost] Simplify extractvalue across callsite (#145054)Tobias Stadler1-3/+12
Motivation: When using libc++, `std::bitset<64>::count()` doesn't optimize to a single popcount instruction on AArch64, because we fail to inline the library code completely. Inlining fails, because the internal bit_iterator struct is passed as a [2 x i64] %arg value on AArch64. The value is built using insertvalue instructions and only one of the array entries is constant. If we know that this entry is constant, we can prove that half the function becomes dead. However, InlineCost only considers operands for simplification if they are Constants, which %arg is not. Without this simplification the function is too expensive to inline. Therefore, we had to teach InlineCost to support non-Constant simplified values (PR #145083). Now, we enable this for extractvalue, because we want to simplify the extractvalue with the insertvalues from the caller function. This is enough to get bitset::count fully optimized. There are similar opportunities we can explore for BinOps in the future (e.g. cmp eq %arg1, %arg2 when the caller passes the same value into both arguments), but we need to be careful here, because InstSimplify isn't completely safe to use with operands owned by different functions.
2025-06-23[InlineCost] Allow simplifying to non-Constant values (NFCI) (#145083)Tobias Stadler1-25/+51
Allow mapping callee Values to arbitrary (non-Constant) simplified values. The simplified values can also originate from the caller. This enables us to simplify instructions in the callee with instructions from the caller. The first use case for this is simplifying extractvalues (PR #145054).
2025-06-04 [InlineCost]: Optimize inlining of recursive function. (#139982)Hassnaa Hamdi1-60/+43
- Consider inlining recursive function of depth 1 only when the caller is the function itself instead of inlining it for each callsite so that we avoid redundant work. - Use CondContext instead of DomTree for better compilation time.
2025-05-09[aarch64][x86][win] Add compiler support for MSVC's /funcoverride flag ↵Daniel Paoliello1-0/+4
(Windows kernel loader replaceable functions) (#125320) Adds support for MSVC's undocumented `/funcoverride` flag, which marks functions as being replaceable by the Windows kernel loader. This is used to allow functions to be upgraded depending on the capabilities of the current processor (e.g., the kernel can be built with the naive implementation of a function, but that function can be replaced at boot with one that uses SIMD instructions if the processor supports them). For each marked function we need to generate: * An undefined symbol named `<name>_$fo$`. * A defined symbol `<name>_$fo_default$` that points to the `.data` section (anywhere in the data section, it is assumed to be zero sized). * An `/ALTERNATENAME` linker directive that points from `<name>_$fo$` to `<name>_$fo_default$`. This is used by the MSVC linker to generate the appropriate metadata in the Dynamic Value Relocation Table. Marked function must never be inlined (otherwise those inline sites can't be replaced). Note that I've chosen to implement this in AsmPrinter as there was no way to create a `GlobalVariable` for `<name>_$fo$` that would result in a symbol being emitted (as nothing consumes it and it has no initializer). I tried to have `llvm.used` and `llvm.compiler.used` point to it, but this didn't help. Within LLVM I referred to this feature as "loader replaceable" as "function override" already has a different meaning to C++ developers... I also took the opportunity to extract the feature symbol generation code used by both AArch64 and X86 into a common function in AsmPrinter.
2025-05-06[InlineCost]: Add a new heuristic to branch folding for better inlining ↵Hassnaa Hamdi1-0/+81
decisions. Recursive functions are generally not inlined to avoid issues like infinite inlining or excessive code expansion. However, this conservative approach misses opportunities for optimization in cases where a recursive call is guaranteed to execute only once. This patch detects a scenario where a guarding branch condition of a recursive call will become false after the first iteration of the recursive function. If such a condition is met, and the recursion depth is confirmed to be one, the Inliner will now consider this recursive function for inlining. A new test case (`test/Transforms/Inline/inline-recursive-fn.ll`) has been added to verify this behaviour.
2025-04-03[IR][NFC] Use `SwitchInst::defaultDestUnreachable` (#134199)Yingwei Zheng1-8/+8
2025-03-30InlineCostAnnotationPrinter: Fix constructing random TargetTransformInfo ↵Matt Arsenault1-4/+7
(#133637) Query the correct TTI for the current target instead of constructing some random default one. Also query the pass manager for ProfileSummaryInfo. This should only change the printing, not the actual result.
2025-03-25[Analysis] Use *Set::insert_range (NFC) (#132878)Kazu Hirata1-2/+1
We can use *Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E); down to: Set.insert_range(Range); In some cases, we can further fold that into the set declaration.
2025-03-22Revert "[Analysis][EphemeralValuesAnalysis][NFCI] Remove ↵Vasileios Porpodas1-11/+8
EphemeralValuesCache class (#132454)" This reverts commit 4adefcfb856aa304b7b0a9de1eec1814f3820e83.
2025-03-21[Analysis][EphemeralValuesAnalysis][NFCI] Remove EphemeralValuesCache class ↵vporpo1-8/+11
(#132454) This is a follow-up to https://github.com/llvm/llvm-project/pull/130210. The EphemeralValuesAnalysis pass used to return an EphemeralValuesCache object which used to hold the ephemeral values and used to provide a lazy collection of the ephemeral values, and an invalidation using the `clear()` function. This patch removes the EphemeralValuesCache class completely and instead returns the SmallVector containing the ephemeral values.
2025-03-19[Analysis][EphemeralValuesCache][InlineCost] Ephemeral values caching for ↵vporpo1-16/+33
the CallAnalyzer (#130210) This patch does two things: 1. It implements an ephemeral values cache analysis pass that collects the ephemeral values of a function and caches them for fast lookups. The collection of the ephemeral values is done lazily when the user calls `EphemeralValuesCache::ephValues()`. 2. It adds caching of ephemeral values using the `EphemeralValuesCache` to speed up `CallAnalyzer::analyze()`. Without caching this can take a long time to run in cases where the function contains a large number of `@llvm.assume()` calls and a large number of callsites. The time is spent in `collectEphemeralvalues()`.
2025-02-18[Analysis] Avoid repeated hash lookups (NFC) (#127574)Kazu Hirata1-4/+6
2025-01-18[Analysis] Avoid repeated hash lookups (NFC) (#123446)Kazu Hirata1-2/+4
Co-authored-by: Nikita Popov <github@npopov.com>
2024-12-04[Inliner] Add a helper around `SimplifiedValues.lookup`. NFCI (#118646)Marina Taylor1-49/+27
2024-11-29[Inliner] Don't count a call penalty for foldable __memcpy_chk and similar ↵Marina Taylor1-17/+69
(#117876) When the size is an appropriate constant, __memcpy_chk will turn into a memcpy that gets folded away by InstCombine. Therefore this patch avoids counting these as calls for purposes of inlining costs. This is only really relevant on platforms whose headers redirect memcpy to __memcpy_chk (such as Darwin). On platforms that use intrinsics, memcpy and similar functions are already exempt from call penalties.
2024-11-01[InlineCost] Print inline cost for invoke call sites as well (#114476)Min-Yih Hsu1-4/+4
Previously InlineCostAnnotationPrinter only prints inline cost for call instructions. I don't think there is any reason not to analyze invoke and its callee, and this patch adds such support.
2024-10-11[TTI][AMDGPU] Allow targets to adjust `LastCallToStaticBonus` via ↵Shilei Tian1-1/+1
`getInliningLastCallToStaticBonus` (#111311) Currently we will not be able to inline a large function even if it only has one live use because the inline cost is still very high after applying `LastCallToStaticBonus`, which is a constant. This could significantly impact the performance because CSR spill is very expensive. This PR adds a new function `getInliningLastCallToStaticBonus` to TTI to allow targets to customize this value. Fixes SWDEV-471398.
2024-09-29[Analysis] Avoid repeated hash lookups (NFC) (#110397)Kazu Hirata1-4/+6
2024-08-13[DataLayout] Remove constructor accepting a pointer to Module (#102841)Sergei Barannikov1-2/+1
The constructor initializes `*this` with `M->getDataLayout()`, which is effectively the same as calling the copy constructor. There does not seem to be a case where a copy would be necessary. Pull Request: https://github.com/llvm/llvm-project/pull/102841
2024-07-09[Inline] Remove bitcast handling in ↵Yingwei Zheng1-2/+0
`CallAnalyzer::stripAndComputeInBoundsConstantOffsets` (#97988) As we are now using opaque pointers, bitcast handling is no longer needed. Closes https://github.com/llvm/llvm-project/issues/97590.
2024-06-28[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)Nikita Popov1-2/+2
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.
2024-05-10Reapply "[InlineCost] Correct the default branch cost for the switch ↵DianQK1-10/+15
statement (#85160)" This reverts commit c6e4f6309184814dfc4bb855ddbdb5375cc971e0.
2024-05-09Replace uses of ConstantExpr::getCompare. (#91558)Eli Friedman1-7/+5
Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands in other places. This only changes places where the either the fold is guaranteed to succeed, or the code doesn't use the resulting compare if we fail to fold.
2024-05-05Revert "[InlineCost] Correct the default branch cost for the switch ↵DianQK1-15/+10
statement (#85160)" This reverts commit 882814edd33cab853859f07b1dd4c4fa1393e0ea.
2024-05-05[InlineCost] Correct the default branch cost for the switch statement (#85160)Quentin Dian1-10/+15
Fixes #81723. The earliest commit of the related code is: https://github.com/llvm/llvm-project/commit/919f9e8d65ada6552b8b8a5ec12ea49db91c922a. I tried to understand the following code with https://github.com/llvm/llvm-project/pull/77856#issuecomment-1993499085. https://github.com/llvm/llvm-project/blob/5932fcc47855fdd209784f38820422d2369b84b2/llvm/lib/Analysis/InlineCost.cpp#L709-L720 I think only scenarios where there is a default branch were considered.
2024-03-28[InlineCost] Disable cost-benefit when sample based PGO is used (#86626)Xiangyang (Mark) Guo1-1/+1
#66457 makes InlineCost to use cost-benefit by default, which causes 0.4-0.5% performance regression on multiple internal workloads. See discussions https://github.com/llvm/llvm-project/pull/66457. This pull request reverts it. Co-authored-by: helloguo <helloguo@meta.com>
2024-02-11[InlineCost] Consider the default branch when calculating cost (#77856)Quentin Dian1-8/+13
First step in fixing #76772. This PR considers the default branch as a case branch. This will give the unreachable default branch fair consideration.
2024-01-25[Analysis] Use llvm::successors (NFC)Kazu Hirata1-3/+2
2024-01-04[IR] Fix GEP offset computations for vector GEPs (#75448)Jannik Silvanus1-1/+1
Vectors are always bit-packed and don't respect the elements' alignment requirements. This is different from arrays. This means offsets of vector GEPs need to be computed differently than offsets of array GEPs. This PR fixes many places that rely on an incorrect pattern that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`. We replace these by usages of `GTI.getSequentialElementStride(DL)`, which is a new helper function added in this PR. This changes behavior for GEPs into vectors with element types for which the (bit) size and alloc size is different. This includes two cases: * Types with a bit size that is not a multiple of a byte, e.g. i1. GEPs into such vectors are questionable to begin with, as some elements are not even addressable. * Overaligned types, e.g. i16 with 32-bit alignment. Existing tests are unaffected, but a miscompilation of a new test is fixed. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2023-10-31[AArch64][SME] Extend Inliner cost-model with custom penalty for calls. (#68416)Sander de Smalen1-6/+9
This is a stacked PR following on from #68415 This patch has two purposes: (1) It tries to make inlining more likely when it can avoid a streaming-mode change. (2) It avoids inlining when inlining causes more streaming-mode changes. An example of (1) is: ``` void streaming_compatible_bar(void); void foo(void) __arm_streaming { /* other code */ streaming_compatible_bar(); /* other code */ } void f(void) { foo(); // expensive streaming mode change } -> void f(void) { /* other code */ streaming_compatible_bar(); /* other code */ } ``` where it wouldn't have inlined the function when foo would be a non-streaming function. An example of (2) is: ``` void streaming_bar(void) __arm_streaming; void foo(void) __arm_streaming { streaming_bar(); streaming_bar(); } void f(void) { foo(); // expensive streaming mode change } -> (do not inline into) void f(void) { streaming_bar(); // these are now two expensive streaming mode changes streaming_bar(); }```
2023-10-02[NFC][Inliner] Introduce another multiplier for cost benefit analysis and ↵Mingming Liu1-13/+76
make multipliers overriddable in TargetTransformInfo. - The motivation is to expose tunable knobs to control the aggressiveness of inlines for different backend (e.g., machines with different icache size, and workload with different icache/itlb PMU counters). Tuning inline aggressiveness shows a small (~+0.3%) but stable improvement on workload/hardware that is more frontend bound. - Both multipliers could be overridden from command line. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D153154
2023-09-22[Analysis] Use std::clamp (NFC)Kazu Hirata1-2/+2
2023-09-21Revert "[InlineCost] Check for conflicting target attributes early"Kazu Hirata1-14/+6
This reverts commit d6f994acb3d545b80161e24ab742c9c69d4bbf33. Several people have reported breakage resulting from this patch: - https://github.com/llvm/llvm-project/issues/65152 - https://github.com/llvm/llvm-project/issues/65205
2023-09-21Reland [InlineCost] Enable the cost benefit analysis for Sample PGO (#66457)HaohaiWen1-1/+1
Enables the cost-benefit-analysis-based inliner by default if we have sample profile. No extra fix is required.
2023-09-21Revert "[InlineCost] Enable the cost benefit analysis for Sample PGO (#66457)"Haohai Wen1-1/+1
This reverts commit 2f2319cf2406d9830a331cbf015881c55ae78806.
2023-09-21[InlineCost] Enable the cost benefit analysis for Sample PGO (#66457)HaohaiWen1-1/+1
Enables the cost-benefit-analysis-based inliner by default if we have sample profile.
2023-09-20[InlineCost]Account for switch instructons when the switch condition could ↵Mingming Liu1-0/+3
be simplified as a result of inlines. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D152053
2023-09-14Avoid BlockFrequency overflow problems (#66280)Matthias Braun1-5/+6
Multiplying raw block frequency with an integer carries a high risk of overflow. - Add `BlockFrequency::mul` return an std::optional with the product or `nullopt` to indicate an overflow. - Fix two instances where overflow was likely.
2023-08-25[Analysis] Fix a comment typo (NFC)Kazu Hirata1-1/+1
2023-06-29[InlineCost][TargetTransformInfo][AMDGPU] Consider cost of alloca ↵Juan Manuel MARTINEZ CAAMAÑO1-2/+9
instructions in the caller (1/2) On AMDGPU, alloca instructions have penalty that can be avoided when SROA is applied after inlining. This patch introduces the default implementation of TargetTransformInfo::getCallerAllocaCost. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D149740
2023-06-10[NFC][SetVector] Update some usages of SetVector to SmallSetVectorDhruv Chawla1-3/+1
This patch is a continuation of D152497. It updates usages of SetVector that were found in llvm/ and clang/ which were originally specifying either SmallPtrSet or SmallVector to just using SmallSetVector, as the overhead of SetVector is reduced with D152497. This also helps clean up the code a fair bit, and gives a decent speed boost at -O0 (~0.2%): https://llvm-compile-time-tracker.com/compare.php?from=9ffdabecabcddde298ff313f5353f9e06590af62&to=97f1c0cde42ba85eaa67cbe89bec8fe45b801f21&stat=instructions%3Au Differential Revision: https://reviews.llvm.org/D152522
2023-06-02[InlineCost] Check for conflicting target attributes earlyKazu Hirata1-6/+14
When we inline a callee into a caller, the compiler needs to make sure that the caller supports a superset of instruction sets that the callee is allowed to use. Normally, we check for the compatibility of target features via functionsHaveCompatibleAttributes, but that happens after we decide to honor call site attribute Attribute::AlwaysInline. If the caller contains a call marked with Attribute::AlwaysInline, which can happen with __attribute__((flatten)) placed on the caller, the caller could end up with code that cannot be lowered to assembly code. This patch fixes the problem by checking the target feature compatibility before we honor Attribute::AlwaysInline. Fixes https://github.com/llvm/llvm-project/issues/62664 Differential Revision: https://reviews.llvm.org/D150396
2023-05-25[InlineCost] Consider branches with !make.implicit metadata as free.Denis Antrushin1-6/+20
!make.implicit metadata attached to branch means it will very likely be eliminated (together with associated cmp instruction). Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D149747
2023-04-27Adjust macros which define the ML inlining features.Jacob Hegna1-24/+24
This aligns the inlining macros more closely with how the regalloc macros are defined. - Explicitly specify the dtype/shape - Remove separate names for python/C++ - Add docstring for inline cost features Differential Revision: https://reviews.llvm.org/D149384
2023-04-06Revert "[InlineCost] isKnownNonNullInCallee - handle also dereferenceable ↵Dávid Bolvanský1-1/+1
attribute" This reverts commit 3b5ff3a67c1f0450a100dca34d899ecd3744cb36.
2023-04-06[InlineCost] isKnownNonNullInCallee - handle also dereferenceable attributeDávid Bolvanský1-1/+1
2023-03-16[Inliner] clang-format InlineCost.cpp and Inliner.cpp (NFC)Kazu Hirata1-6/+6
2023-03-14[Analysis] Use *{Set,Map}::contains (NFC)Kazu Hirata1-2/+2