aboutsummaryrefslogtreecommitdiff
path: root/llvm/test/Transforms/InstSimplify
AgeCommit message (Collapse)AuthorFilesLines
2026-02-13[InstructionSimplify] Extend simplifyICmpWithZero to handle equivalent zero ↵Kunqiu Chen1-0/+212
RHS (#179055) Add a new helper function `matchEquivZeroRHS()` that recognizes comparisons with constants that are equivalent to comparisons with zero, and transforms the predicate accordingly. This handles the following transformations: - icmp sgt X, -1 --> icmp sge X, 0 - icmp sle X, -1 --> icmp slt X, 0 - icmp [us]ge X, 1 --> icmp [us]gt X, 0 - icmp [us]lt X, 1 --> icmp [us]le X, 0 This enables more optimization opportunities in `simplifyICmpWithZero`, such as folding icmp sgt X, -1 when X is known to be non-negative. --- - IR Impact: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/3414
2026-02-05IR: Promote "denormal-fp-math" to a first class attribute (#174293)Matt Arsenault3-70/+70
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first class denormal_fpenv attribute. Previously the query for the effective denormal mode involved two string attribute queries with parsing. I'm introducing more uses of this, so it makes sense to convert this to a more efficient encoding. The old representation was also awkward since it was split across two separate attributes. The new encoding just stores the default and float modes as bitfields, largely avoiding the need to consider if the other mode is set. The syntax in the common cases looks like this: `denormal_fpenv(preservesign,preservesign)` `denormal_fpenv(float: preservesign,preservesign)` `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)` I wasn't sure about reusing the float type name instead of adding a new keyword. It's parsed as a type but only accepts float. I'm also debating switching the name to subnormal to match the current preferred IEEE terminology (also used by nofpclass and other contexts). This has a behavior change when using the command flag debug options to set the denormal mode. The behavior of the flag ignored functions with an explicit attribute set, per the default and f32 version. Now that these are one attribute, the flag logic can't distinguish which of the two components were explicitly set on the function. Only one test appeared to rely on this behavior, so I just avoided using the flags in it. This also does not perform all the code cleanups this enables. In particular the attributor handling could be cleaned up. I also guessed at how to support this in MLIR. I followed MemoryEffects as a reference; it appears bitfields are expanded into arguments to attributes, so the representation there is a bit uglier with the 2 2-element fields flattened into 4 arguments.
2026-01-26[ConstantFold] constant fold bfloat <-> half bitcasts (#177663)Karol Zwolak1-0/+44
2026-01-21[IR] Allow non-constant offsets in @llvm.vector.splice.{left,right} (#174693)Luke Lau1-0/+94
Following on from #170796, this PR implements the second part of https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974 by allowing non-constant offsets in the vector splice intrinsics. Previously @llvm.vector.splice had a restriction enforced by the verifier that the offset had to be known to be within the range of the vector at compile time. Because we can't enforce this with non-constant offsets, it's been relaxed so that offsets that would slide the vector out of bounds return a poison value, similar to insertelement/extractelement. @llvm.vector.splice.left also previously only allowed offsets within the range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so that it's consistent with @llvm.vector.splice.right. In lieu of the verifier checks that were removed, InstSimplify has been taught to fold splices to poison when the offset is out of bounds. The cost model isn't implemented in this PR, and just returns invalid for any non-constant offsets for now. I think the correct way to cost these non-constant offets isn't through getShuffleCost because they can't handle variable masks, but instead just through getIntrinsicInstCost.
2026-01-16[InstSimplify] Fall back to the rest of the logic if folding of the consts ↵Karol Zwolak1-0/+34
isn't successfull when simplifying fcmp (#176159) Fixes #175949.
2026-01-10ValueTracking: Check if fmul operand could be undef (#174458)Matt Arsenault1-8/+8
In the special case for the same value for both operands, ensure the value isn't undef.
2026-01-09[ConstantFolding] Allow truncation when folding wasm.dotNikita Popov1-0/+8
Changes this to getSigned() to match the signedness of the calculation. However, we still need to allow truncation because the addition result may overflow, and the operation is specified to truncate in that case. Fixes https://github.com/llvm/llvm-project/issues/175159.
2026-01-08ValueTracking: Check if fdiv operand could be undef (#174453)Matt Arsenault1-1/+1
In the special case for fdiv/frem with the same operands, make sure the input isn't undef.
2026-01-08[IR] Fix canReplacePointersIfEqual to properly validate vector pointers ↵hanbeom1-0/+33
(#174142) Previously, `canReplacePointersIfEqual` unconditionally returned `true` for vectors of pointers (e.g., `<2 x ptr>`) because it only checked for scalar pointer types. This resulted in a failure to perform appropriate verification for these types. This patch fixes the logic to ensure they are properly validated. Fixes https://github.com/llvm/llvm-project/issues/174045
2026-01-05[ValueTracking] Support ptrtoaddr in inequality implication (#173362)Nikita Popov1-0/+57
`ptrtoaddr(p1) - ptrtoaddr(p2) == non-zero` implies `p1 != p2`, same as for ptrtoint.
2025-12-24[ConstantFolding] Add edge cases for llvm.log{,2,10} (#173304)Stefan Weigl-Bosker1-0/+104
Addresses https://github.com/llvm/llvm-project/issues/173267. - folds log(-x) -> NaN - folds log(0) -> -inf - also folds log(1) -> 0.0 without host libm > note: log(inf) is also doable but it causes some other tests to fail so I avoided it for now
2025-12-23[ValueTracking] Support ptrtoaddr in isKnownNonZero() (#173275)Nikita Popov1-0/+58
Add support for ptrtoaddr in isKnownNonZero(). We can directly forward to isKnownNonZero() for the pointer here, as we define nonnull as applying to the address bits. Also adjust the ptrtoint implementation to match, by requiring that the result type >= address size (rather than >= pointer size). This is just for clarity, in practice this is a non-canonical form.
2025-12-20[InstCombine] Propagate poison through fshl and fshr intrinsics (#172859)Sayan Sivakumaran1-19/+14
Currently these intrinsics output `undef` on poison, which triggers CI errors on PRs that want to add poison tests for funnel shifts (such as #172723). Let's make `fshl` and `fshr` propagate poison instead.
2025-12-15[InstSimplify] Support ptrtoaddr in simplifyICmpInst() (#171985)Nikita Popov1-0/+72
This is basically the same change as #162653, but for InstSimplify instead of ConstantFolding. It folds `icmp (ptrtoaddr x, ptrtoaddr y)` to `icmp (x, y)` and `icmp (ptrtoaddr x, C)` to `icmp (x, inttoptr C)`. The fold is restricted to the case where the result type is the address type, as icmp only compares the icmp bits. As in the other PR, I think in practice all the folds are also going to work if the ptrtoint result type is larger than the address size, but it's unclear how to justify this in general.
2025-12-11[ConstantFolding] Support ptrtoaddr in ConstantFoldCompareInstOperands (#162653)Nikita Popov1-0/+82
This folds `icmp (ptrtoaddr x, ptrtoaddr y)` to `icmp (x, y)`, matching the existing ptrtoint fold. Restrict both folds to only the case where the result type matches the address type. I think that all folds this can do in practice end up actually being valid for ptrtoint to a type large than the address size as well, but I don't really see a way to justify this generically without making assumptions about what kind of folding the recursive calls may do. This is based on the icmp semantics specified in https://github.com/llvm/llvm-project/pull/163936.
2025-12-09[InstSimplify] Ignore mask when combinining vp.reverse(vp.reverse). (#171542)Craig Topper1-0/+9
The mask doesn't really affect the reverse. It only poisons the masked off elements in the results. It should be ok to ignore the mask if we can eliminate the pair. I don't have a specific use case for this, but it matches what I had implemented in our downstream before the current upstream implementation. Submitting upstream so I can remove the delta in my downstream.
2025-12-05[ConstantFolding] Handle roundeven libcalls (#170692)valadaptive1-16/+8
Basically identical to nearbyint and rint, which we already treat as rounding to nearest with ties to even during constant folding.
2025-12-04[InstSimplify] Add roundeven constant-propagation tests (#170688)valadaptive1-0/+165
The libcall versions will later be optimized and constant-folded.
2025-12-02Avoid maxnum(sNaN, x) optimizations / folds (#170181)Lewis Crawford2-17/+30
The behaviour of constant-folding `maxnum(sNaN, x)` and `minnum(sNaN, x)` has become controversial, and there are ongoing discussions about which behaviour we want to specify in the LLVM IR LangRef. See: - https://github.com/llvm/llvm-project/issues/170082 - https://github.com/llvm/llvm-project/pull/168838 - https://github.com/llvm/llvm-project/pull/138451 - https://github.com/llvm/llvm-project/pull/170067 - https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006 This patch removes optimizations and constant-folding support for `maxnum(sNaN, x)` but keeps it folded/optimized for `qNaN`. This should allow for some more flexibility so the implementation can conform to either the old or new version of the semantics specified without any changes. As far as I am aware, optimizations involving constant `sNaN` should generally be edge-cases that rarely occur, so here should hopefully be very little real-world performance impact from disabling these optimizations.
2025-11-20[InstSimplify] Extend icmp-of-add simplification to sle/sgt/sge (#168900)Pedro Lobo1-12/+102
When comparing additions with the same base where one has `nsw`, the following simplification can be performed: ```llvm icmp slt/sgt/sle/sge (x + C1), (x +nsw C2) => icmp slt/sgt/sle/sge C1, C2 ``` Previously this was only done for `slt`. This patch extends it to the `sgt`, `sle`, and `sge` predicates when either of the conditions hold: - `C1 <= C2 && C1 >= 0`, or - `C2 <= C1 && C1 <= 0` This patch also handles the `C1 == C2` case, which was previously excluded. Proof: https://alive2.llvm.org/ce/z/LtmY4f
2025-11-19[ConstantFolding] Add constant folding for scalable vector interleave ↵Craig Topper1-0/+122
intrinsics. (#168668) We can constant fold interleave of identical splat vectors to a larger splat vector.
2025-11-19[ConstantFolding] Generalize constant folding for vector_deinterleave2 to ↵Craig Topper1-0/+192
deinterleave3-8. (#168640)
2025-11-19[InstSimplify] Add whitespace to struct declarations in vector-calls.ll. NFCCraig Topper1-12/+12
This matches how IR is printed.
2025-11-18[ConstantFolding] Generalize constant folding for vector_interleave2 to ↵Craig Topper1-0/+48
interleave3-8. (#168473)
2025-11-18[LLVM][InstSimplify] Add folds for SVE integer reduction intrinsics. (#167519)Paul Walker2-0/+914
[andv, eorv, orv, s/uaddv, s/umaxv, s/uminv] sve_reduce_##(none, ?) -> op's neutral value sve_reduce_##(any, neutral) -> op's neutral value [andv, orv, s/umaxv, s/uminv] sve_reduce_##(all, splat(X)) -> X [eorv] sve_reduce_##(all, splat(X)) -> 0
2025-11-15[InstSimplify] Fix crash when optimizing minmax with bitcast constant ↵Igor Gorban1-0/+70
vectors (#168055) When simplifying min/max intrinsics with fixed-size vector constants, InstructionSimplify attempts to optimize element-wise. However, getAggregateElement() can return null for certain constant expressions like bitcasts, leading to a null pointer dereference. This patch adds a check to bail out of the optimization when getAggregateElement() returns null, preventing the crash while maintaining correct behavior for normal constant vectors. Fixes crash with patterns like: call <2 x half> @llvm.minnum.v2f16(<2 x half> %x, <2 x half> bitcast (<1 x i32> <i32 N> to <2 x half>))
2025-10-31[LLVM][ConstantFolding] Extend constantFoldVectorReduce to include scalable ↵Paul Walker1-52/+52
vectors. (#165437)
2025-10-28Extend vector reduction constants folding tests to include scalable vectors.Paul Walker1-57/+361
2025-10-21[InstSimplify] Support ptrtoaddr in simplifyGEPInst() (#164262)Nikita Popov1-0/+109
This adds support for ptrtoaddr in the `ptradd p, ptrtoaddr(p2) - ptrtoaddr(p) -> p2` fold. This fold requires that p and p2 have the same underlying object (otherwise the provenance may not be the same). The argument I would like to make here is that because the underlying objects are the same (and the pointers in the same address space), the non-address bits of the pointer must be the same. Looking at some specific cases of underlying object relationship: * phi/select: Trivially true. * getelementptr: Only modifies address bits, non-address bits must remain the same. * addrspacecast round-trip cast: Must preserve all bits because we optimize such round-trip casts away. * non-interposable global alias: I'm a bit unsure about this one, but I guess the alias and the aliasee must have the same non-address bits? * various intrinsics like launder.invariant.group, ptrmask. I think these all either preserve all pointer bits (like the invariant.group ones) or at least the non-address bits (like ptrmask). There are some interesting cases like amdgcn.make.buffer.rsrc, but those are cross address-space. ----- There is a second `gep (gep p, C), (sub 0, ptrtoint(p)) -> C` transform in this function, which I am not extending to handle ptrtoaddr, adding negative tests instead. This transform is overall dubious for provenance reasons, but especially dubious with ptrtoaddr, as then we don't have the guarantee that provenance of `p` has been exposed.
2025-10-20[InstCombine] Move ptrtoaddr tests to InstSimplify (NFC)Nikita Popov1-0/+209
All the existing tests test code either in ConstantFolding or InstSimplify, so move them to use -passes=instsimplify instead of -passes=instcombine. This makes sure we keep InstSimplify coverage even if there are subsuming InstCombine folds. This requires writing some of the constant folding tests in a different way, as InstSimplify does not try to re-fold already existing constant expressions.
2025-10-20[ValueTracking] Teach isGuaranteedNotToBeUndefOrPoison about splats (#163570)Cullen Rhodes1-0/+27
Splats include two poison values, but only the poison-ness of the splatted value actually matters.
2025-10-17[LLVM][ConstProp] Enable intrinsic simplifications for vector ConstantInt ↵Paul Walker5-0/+87
based operands. (#159358) Simplifcation of vector.reduce intrinsics are prevented by an early bailout for ConstantInt base operands. This PR removes the bailout and updates the tests to show matching output when -use-constant-int-for-*-splat is used.
2025-10-14[InstSimplify] Support ptrtoaddr in ptrmask foldNikita Popov1-0/+20
Treat it the same way as ptrtoint. ptrmask only operates on the address bits of the pointer.
2025-10-10[InstSimplify] Support non-inbounds GEP in ptrdiff fold (#162676)Nikita Popov1-5/+1
We can fold ptrdiff(ptradd(p, x), p) to x regardless of whether the ptradd is inbounds. Proof: https://alive2.llvm.org/ce/z/Xuvc7N
2025-10-09[InstSimplify] Clean up naming in ptr diff test (NFC)Nikita Popov1-26/+8
Turns out there already was a test for the non-inbounds variant, so remove the duplicate. Rename the tests to be more meaningful. Drop irrelevant target triple.
2025-10-09[InstSimplify] Add test for ptr diff without inbounds (NFC)Nikita Popov1-8/+23
Also regenerate the test in current format.
2025-10-07[IR] Require DataLayout for pointer cast elimination (#162279)Nikita Popov1-2/+3
isEliminableCastPair() currently tries to support elimination of ptrtoint/inttoptr cast pairs by assuming that the maximum possible pointer size is 64 bits. Of course, this is no longer the case nowadays. This PR changes isEliminableCastPair() to accept an optional DataLayout argument, which is required to eliminate pointer casts. This means that we no longer eliminate these cast pairs during ConstExpr construction, and instead only do it during DL-aware constant folding. This had a lot of annoying fallout on tests, most of which I've addressed in advance of this change.
2025-10-07[InstSimplify] Optimize maximumnum and minimumnum (#139581)Lewis Crawford1-106/+139
Add support for the new maximumnum and minimumnum intrinsics in various optimizations in InstSimplify. Also, change the behavior of optimizing maxnum(sNaN, x) to simplify to qNaN instead of x to better match the LLVM IR spec, and add more tests for sNaN behavior for all 3 max/min intrinsic types.
2025-10-07[InstSimplify] Add test for incorrect handling of wide pointers (NFC)Nikita Popov1-0/+13
The intermediate integer type is too small to hold the full value.
2025-10-05[InstSimplify] Simplify fcmp implied by dominating fcmp (#161090)Yingwei Zheng1-0/+207
This patch simplifies an fcmp into true/false if it is implied by a dominating fcmp. As an initial support, it only handles two cases: + `fcmp pred1, X, Y -> fcmp pred2, X, Y`: use set operations. + `fcmp pred1, X, C1 -> fcmp pred2, X, C2`: use `ConstantFPRange` and set operations. Note: It doesn't fix https://github.com/llvm/llvm-project/issues/70985, as the second fcmp in the motivating case is not dominated by the edge. We may need to adjust JumpThreading to handle this case. Comptime impact (~+0.1%): https://llvm-compile-time-tracker.com/compare.php?from=a728f213c863e4dd19f8969a417148d2951323c0&to=8ca70404fb0d66a824f39d83050ac38e2f1b25b9&stat=instructions:u IR diff: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2848
2025-09-25[NVPTX] Fix NaN + overflow semantics of f2ll/d2i (#159530)Lewis Crawford2-88/+60
Fix the NaN-handling semantics of various NVVM intrinsics converting from fp types to integer types. Previously in ConstantFolding, NaN inputs would be constant-folded to 0. However, v9.0 of the PTX spec states that: In float-to-integer conversions, depending upon conversion types, NaN input results in following value: * Zero if source is not `.f64` and destination is not `.s64`, .`u64`. * Otherwise `1 << (BitWidth(dst) - 1)` corresponding to the value of `(MAXINT >> 1) + 1` for unsigned type or `MININT` for signed type. Also, support for constant-folding +/-Inf and values which overflow/underflow the integer output type has been added (they clamp to min/max int). Because of this NaN-handling semantic difference, we also need to disable transforming several intrinsics to FPToSI/FPToUI, as the LLVM intstruction will return poison, but the intrinsics have defined behaviour for these edge-cases like NaN/Inf/overflow.
2025-09-25[NFC][InstSimplify] Refactor fminmax-folds.ll test (#160504)Lewis Crawford1-1384/+850
Refactor all the tests in `fminmax-folds.ll` so that they are grouped by optimization, rather than by intrinsic. Instead of calling 1 intrinsic per function, each function now tests all 6 variants of the intrinsic. Results are stored to named pointers to maintain readability in this more compact form. This makes it much easier to compare the outputs from each intrinsic, rather than having them scattered in different functions in different parts of the file. It is also much more compact, so despite adding >50% more tests, the file is ~500 lines shorter. The tests added include: * Adding `maximumnum` and `minimumnum` everywhere (currently not optimized, but added as a baseline for future optimizations in #139581). * Adding separate tests for SNaN and QNaN (as a baseline for correctness improvements in #139581 ) * Adding tests for scalable vectors * Increasing the variety of types used in various tests by using more f16, f64, and vector types in tests. The only coverage removed is for tests with undef (only poison is now tested for). Overall, this refactor should increase coverage, improve readability with more comments and clear section headers, and make the tests much more compact and easier to review in #139581 by providing a clear baseline for each intrinsic's current behaviour.
2025-09-24[InstSimplify] Consider vscale_range for get active lane mask (#160073)Matthew Devereau1-0/+48
Scalable get_active_lane_mask intrinsic calls can be simplified to i1 splat (ptrue) when its constant range is larger than or equal to the maximum possible number of elements, which can be inferred from vscale_range(x, y)
2025-09-23[ConstantFolding] Avoid use of isNonIntegralPointerType()Alexander Richardson2-16/+145
Avoiding any new inttoptr is unnecessarily restrictive for "plain" non-integral pointers, but it is important for unstable pointers and pointers with external state. Fixes another test codegen regression from https://github.com/llvm/llvm-project/pull/105735. Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/159959
2025-09-23[DataLayout][LangRef] Split non-integral and unstable pointer propertiesAlexander Richardson1-1/+3
This commit adds finer-grained versions of isNonIntegralAddressSpace() and isNonIntegralPointerType() where the current semantics prohibit introduction of both ptrtoint and inttoptr instructions. The current semantics are too strict for some targets (e.g. AMDGPU/CHERI) where ptrtoint has a stable value, but the pointer has additional metadata. Currently, marking a pointer address space as non-integral also marks it as having an unstable bitwise representation (e.g. when pointers can be changed by a copying GC). This property inhibits a lot of optimizations that are perfectly legal for other non-integral pointers such as fat pointers or CHERI capabilities that have a well-defined bitwise representation but can't be created with only an address. This change splits the properties of non-integral pointers and allows for address spaces to be marked as unstable or non-integral (or both) independently using the 'p' part of the DataLayout string. A 'u' following the p marks the address space as unstable and specifying a index width != representation width marks it as non-integral. Finally, we also add an 'e' flag to mark pointers with external state (such as the CHERI capability validity) state. These pointers require special handling of loads and stores in addition to being non-integral. This does not change the checks in any of the passes yet - we currently keep the existing non-integral behaviour. In the future I plan to audit calls to DL.isNonIntegral[PointerType]() and replace them with the DL.mustNotIntroduce{IntToPtr,PtrToInt}() checks that allow for more optimizations. RFC: https://discourse.llvm.org/t/rfc-finer-grained-non-integral-pointer-properties/83176 Reviewed By: nikic, krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/105735
2025-09-17[NFC] Regenerate checks - llvm/test/Transforms/InstSimplify/ConstProp/bswap.llPaul Walker1-18/+21
2025-09-12[InstSimplify] Simplify get.active.lane.mask when 2nd arg is zero (#158018)David Sherwood2-2/+21
When the second argument passed to the get.active.lane.mask intrinsic is zero we can simplify the instruction to return an all-false mask regardless of the first operand.
2025-09-11[ConstFold] Don't crash on ConstantExprs when folding get_active_lane_m.Florian Hahn1-0/+37
Check if operands are ConstantInt to avoid crashing on constant expression after https://github.com/llvm/llvm-project/pull/156659.
2025-09-11[ConstantFolding] Fold scalable get_active_lane_masks (#156659)Matthew Devereau1-0/+33
Scalable get_active_lane_mask intrinsics with a range of 0 can be lowered to zeroinitializer. This helps remove no-op scalable masked stores and loads.
2025-09-10[AMDGPU] Propagate Constants for Wave Reduction Intrinsics (#150395)Aaditya1-10/+445