aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-09-12[InstCombine] Added optimisation for trunc (Negated Pow2 >> x) to i1 (#157998)kper1-4/+10
Follow up of https://github.com/llvm/llvm-project/pull/157030 ``` trunc ( lshr i8 C1, V1) to i1 -> icmp ugt V1, cttz(C1) - 1 iff (C1) is negative power of 2 trunc ( ashr i8 C1, V1) to i1 -> icmp ugt V1, cttz(C1) - 1 iff (C1) is negative power of 2 ``` General proof: lshr: https://alive2.llvm.org/ce/z/vVfaJc ashr: https://alive2.llvm.org/ce/z/8aAcgD
2025-09-10[InstCombine] Added optimisation for trunc (Pow2 >> x) to i1 (#157030)kper1-0/+21
Closes #156898 I have added two cases. The first one matches when the constant is exactly power of 2. The second case was to address the general case mentioned in the linked issue. I, however, did not really solve the general case. We can only emit a `icmp ult` if all the bits are one and that's only the case when the constant + 1 is a power of 2. Otherwise, we need to create `icmp eq` for every bit that is one. Here are a few examples which won't be working with the two cases: - constant is `9`: https://alive2.llvm.org/ce/z/S5FLJZ - subrange in `56`: https://alive2.llvm.org/ce/z/yn_ZNG - and finally an example as worst case (because it alternates the bits): https://alive2.llvm.org/ce/z/nDitNA
2025-08-22[InstComb] Allow more user for (add (ptrtoint %B), %O) to GEP transform. ↵Florian Hahn1-2/+10
(#153566) Generalize the logic from https://github.com/llvm/llvm-project/pull/153421 to support additional cases where the pointer is only used as integer. Alive2 Proof: https://alive2.llvm.org/ce/z/po58pP This enables vectorizing std::find for some cases, if additional assumptions are provided: https://godbolt.org/z/94oq3576E Depends on https://github.com/llvm/llvm-project/pull/15342. PR: https://github.com/llvm/llvm-project/pull/153566
2025-08-21[InstComb] Fold inttoptr (add (ptrtoint %B), %O) -> GEP for ICMP users. ↵Florian Hahn1-0/+13
(#153421) Replace inttoptr (add (ptrtoint %B), %O) with (getelementptr i8, %B, %o) if all users are ICmp instruction, which in turn means only the address value is compared. We should be able to do this, if the src pointer, the integer type and the destination pointer types have the same bitwidth and address space. A common source of such (inttoptr (add (ptrtoint %B), %O)) is from various iterations in libc++. In practice this triggers in a number of files in Clang and various open source projects, including cppcheck, diamond, llama and more. Alive2 Proof with constant offset: https://alive2.llvm.org/ce/z/K_5N_B PR: https://github.com/llvm/llvm-project/pull/153421
2025-08-15[PatternMatch] Allow `m_ConstantInt` to match integer splats (#153692)zGoldthorpe1-6/+5
When matching integers, `m_ConstantInt` is a convenient alternative to `m_APInt` for matching unsigned 64-bit integers, allowing one to simplify ```cpp const APInt *IntC; if (match(V, m_APInt(IntC))) { if (IntC->ule(UINT64_MAX)) { uint64_t Int = IntC->getZExtValue(); // ... } } ``` to ```cpp uint64_t Int; if (match(V, m_ConstantInt(Int))) { // ... } ``` However, this simplification is only true if `V` is a scalar type. Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt` does not. This patch ensures that the matching behaviour of `m_ConstantInt` parallels that of `m_APInt`, and also incorporates it in some obvious places.
2025-07-28[InstCombine] Let shrinkSplatShuffle act on vectors of different lengths ↵Adar Dagan1-2/+6
(#148593) shrinkSplatShuffle in InstCombine would only move truncs up through shuffles if those shuffles inputs had the exact same type as their output, this PR weakens this constraint to only requiring that the scalar type of the input and output match.
2025-06-16[InstCombine] Combine trunc (lshr X, BW-1) to i1 --> icmp slt X, 0 (#142593) ↵mayanksolanki3931-0/+6
(#143846) Fixes #142593, the issue was fixed using the suggestion on the ticket itself. Godbolt: https://godbolt.org/z/oW5b74jc4 alive2 proof: https://alive2.llvm.org/ce/z/QHnD7e
2025-06-16[InstCombine] Propagate FMF from fptrunc when folding `fptrunc fabs(X) -> ↵Yingwei Zheng1-1/+3
fabs(fptrunc X)` (#143352) Alive2: https://alive2.llvm.org/ce/z/DWV3G3 fptrunc yields infinity when the input cannot fit in the target type. So ninf should be propagated from fptrunc. For other intrinsics, the previous check ensures that the result is never an infinity: https://github.com/llvm/llvm-project/blob/5d3899d293e902124c3602b466031b6b799fb123/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp#L1910-L1917 Closes https://github.com/llvm/llvm-project/issues/143122.
2025-06-10[NFC][LLVM] Refactor IRBuilder::Create{VScale,ElementCount,TypeSize}. (#142803)Paul Walker1-16/+8
CreateVScale took a scaling parameter that had a single use outside of IRBuilder with all other callers having to create a redundant ConstantInt. To work round this some code perferred to use CreateIntrinsic directly. This patch simplifies CreateVScale to return a call to the llvm.vscale() intrinsic and nothing more. As well as simplifying the existing call sites I've also migrated the uses of CreateIntrinsic. Whilst IRBuilder used CreateVScale's scaling parameter as part of the implementations of CreateElementCount and CreateTypeSize, I have follow-on work to switch them to the NUW varaiety and thus they would stop using CreateVScale's scaling as well. To prepare for this I have moved the multiplication and constant folding into the implementations of CreateElementCount and CreateTypeSize. As a final step I have replaced some callers of CreateVScale with CreateElementCount where it's clear from the code they wanted the latter.
2025-06-03[ValueTracking] Make Depth last default arg (NFC) (#142384)Ramkumar Ramachandra1-21/+19
Having a finite Depth (or recursion limit) for computeKnownBits is very limiting, but is currently a load-bearing necessity, as all KnownBits are recomputed on each call and there is no caching. As a prerequisite for an effort to remove the recursion limit altogether, either using a clever caching technique, or writing a easily-invalidable KnownBits analysis, make the Depth argument in APIs in ValueTracking uniformly the last argument with a default value. This would aid in removing the argument when the time comes, as many callers that currently pass 0 explicitly are now updated to omit the argument altogether.
2025-05-13[InstCombine] Narrow trunc(lshr) in more cases (#139645)Usman Nadeem1-6/+16
We can narrow `trunc(lshr(i32)) to i8` to `trunc(lshr(i16)) to i8` even when the bits that we are shifting in are not zero, in the cases where the MSBs of the shifted value don't actually matter and actually end up being truncated away. This kind of narrowing does not remove the trunc but can help the vectorizer generate better code in a smaller type. Motivation: libyuv, functions like ARGBToUV444Row_C(). Proof: https://alive2.llvm.org/ce/z/9Ao2aJ
2025-04-28[InstCombine] Support ptrtoint of gep folds for chain of geps (#137323)Nikita Popov1-23/+38
Support the ptrtoint(gep null, x) -> x and ptrtoint(gep inttoptr(x), y) -> x+y folds for the case where there is a chain of geps that ends in null or inttoptr. This avoids some regressions from #137297. While here, also be a bit more careful about edge cases like pointer to vector splats and mismatched pointer and index size.
2025-03-27[InstCombine] Handle scalable splats of constants in getMinimumFPType (#132960)Luke Lau1-6/+6
We previously handled ConstantExpr scalable splats in 5d929794a87602cfd873381e11cc99149196bb49, but only fpexts. ConstantExpr fpexts have since been removed, and simultaneously we didn't handle splats of constants that weren't extended. This updates it to remove the fpext check and instead see if we can shrink the result of getSplatValue. Note that the test case doesn't get completely folded away due to #132922
2025-01-06[IRBuilder] Refactor FMF interface (#121657)Yingwei Zheng1-8/+8
Up to now, the only way to set specified FMF flags in IRBuilder is to use `FastMathFlagGuard`. It makes the code ugly and hard to maintain. This patch introduces a helper class `FMFSource` to replace the original parameter `Instruction *FMFSource` in IRBuilder. To maximize the compatibility, it accepts an instruction or a specified FMF. This patch also removes the use of `FastMathFlagGuard` in some simple cases. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au
2024-12-08[InstCombine] Fold trunc nuw/nsw X to i1 -> true IFF X != 0 (#119131)Andreas Jonson1-0/+5
proof https://alive2.llvm.org/ce/z/prpPex
2024-12-06[InstCombine] Make fptrunc combine use intersection of fast math flags (#118808)John Brawn1-5/+6
These combines involve swapping the fptrunc with its operand, and using the intersection of fast math flags is the safest option as e.g. if we have (fptrunc (fneg ninf x)) then (fneg ninf (fptrunc x)) will not be correct as if x is a not within the range of the destination type the result of (fptrunc x) will be inf.
2024-12-05[InstCombine] Remove nusw handling in ptrtoint of gep fold (NFCI) (#118804)Nikita Popov1-4/+1
Now that #111144 infers gep nuw, we no longer have to repeat the inference in this fold.
2024-11-25[InstCombine] Remove SPF guard for trunc transforms (#117535)Nikita Popov1-9/+0
This shouldn't be necessary anymore now that SPF patterns are canonicalized to intrinsics.
2024-10-11[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)Rahul Joshi1-7/+8
Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).
2024-09-28[InstCombine] foldVecExtTruncToExtElt - extend to handle ↵Simon Pilgrim1-30/+67
trunc(lshr(extractelement(x,c1),c2)) -> extractelement(bitcast(x),c3) patterns. (#109689) This patch moves the existing trunc+extractlement -> extractelement+bitcast fold into a foldVecExtTruncToExtElt helper and extends the helper to handle trunc+lshr+extractelement cases as well. Fixes #107404
2024-09-17[InstCombine] Avoid simplifying bitcast of undef to a zeroinitializer vector ↵Alex MacLean1-0/+5
(#108872) In some cases, if an undef value is the product of another instcombine simplification, a bitcast of undef is simplified to a zeroinitializer vector instead of undef.
2024-08-21[InstCombine] Extend Fold of Zero-extended Bit Test (#102100)Marius Kamp1-6/+13
Previously, (zext (icmp ne (and X, (1 << ShAmt)), 0)) has only been folded if the bit width of X and the result were equal. Use a trunc or zext instruction to also support other bit widths. This is a follow-up to commit 533190acdb9d2ed774f96a998b5c03be3df4f857, which introduced a regression: (zext (icmp ne (and (lshr X ShAmt) 1) 0)) is not folded any longer to (zext/trunc (and (lshr X ShAmt) 1)) since the commit introduced the fold of (icmp ne (and (lshr X ShAmt) 1) 0) to (icmp ne (and X (1 << ShAmt)) 0). The change introduced by this commit restores this fold. Alive proof: https://alive2.llvm.org/ce/z/MFkNXs Relates to issue #86813 and pull request #101838.
2024-08-01[InstCombine] Recognize copysign idioms (#101324)Yingwei Zheng1-0/+24
This patch folds `(bitcast (or (and (bitcast X to int), signmask), nneg Y) to fp)` into `copysign((bitcast Y to fp), X)`. I found this pattern exists in some graphics applications/math libraries. Alive2: https://alive2.llvm.org/ce/z/ggQZV2
2024-07-25Fix unused variable warning. NFC.Simon Pilgrim1-1/+1
2024-07-25Remove the `x86_mmx` IR type. (#98505)James Y Knight1-7/+0
It is now translated to `<1 x i64>`, which allows the removal of a bunch of special casing. This _incompatibly_ changes the ABI of any LLVM IR function with `x86_mmx` arguments or returns: instead of passing in mmx registers, they will now be passed via integer registers. However, the real-world incompatibility caused by this is expected to be minimal, because Clang never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>` or `double`, depending on ABI. This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type. That type simply no longer corresponds to an IR type, and is used only by MMX intrinsics and inline-asm operands. Because SelectionDAGBuilder only knows how to generate the operands/results of intrinsics based on the IR type, it thus now generates the intrinsics with the type MVT::v1i64, instead of MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus have the X86 backend fix them up in DAGCombine. (This may be a short-lived hack, if all the MMX intrinsics can be removed in upcoming changes.) Works towards issue #98272.
2024-07-12Revert "[InstCombine] Generalize ptrtoint(gep) fold (NFC)"Nikita Popov1-21/+4
This reverts commit c45f939e34dafaf0f57fd1d93df7df5cc89f1dec. This refactoring turned out to not be useful for the case I had originally in mind, so revert it for now.
2024-07-12[InstCombine] Generalize ptrtoint(gep) fold (NFC)Nikita Popov1-4/+21
We're currently handling a special case of ptrtoint gep -> add ptrtoint. Reframe the code to make it easier to add more patterns for this transform.
2024-07-11[InstCombine] More precise nuw preservation in ptrtoint of gep foldNikita Popov1-1/+3
We can transfer a nuw flag from the gep to the add. Additionally, the inbounds + nneg case can be relaxed to nusw + nneg. Finally, don't forget to pass the correct context instruction to SimplifyQuery.
2024-06-18[InstCombine] Avoid use of ConstantExpr::getShl()Nikita Popov1-3/+3
Use IRBuilder instead. Also use ImmConstant to guarantee that this will fold.
2024-06-17[InstCombine] Don't preserve context across divNikita Popov1-4/+6
We can't preserve the context across a non-speculatable instruction, as this might introduce a trap. Alternatively, we could also insert all the replacement instruction at the use-site, but that would be a more intrusive change for the sake of this edge case. Fixes https://github.com/llvm/llvm-project/issues/95547.
2024-05-20[InstCombine] Fold pointer adding in integer to arithmetic add (#91596)Monad1-4/+16
Fold ``` llvm define i32 @src(i32 %x, i32 %y) { %base = inttoptr i32 %x to ptr %ptr = getelementptr inbounds i8, ptr %base, i32 %y %r = ptrtoint ptr %ptr to i32 ret i32 %r } ``` where both `%base` and `%ptr` have only one use, to ``` llvm define i32 @tgt(i32 %x, i32 %y) { %r = add i32 %x, %y ret i32 %r } ``` The `add` can be `nuw` if the GEP is `inbounds` and the offset is non-negative. The relevant Alive2 proof is https://alive2.llvm.org/ce/z/nP3RWy. ### Motivation It seems unnecessary to convert `int` to `ptr` just to get its offset. In most cases, they generates the same assembly, but sometimes it may miss some optimizations since the analysis of `GEP` is not as perfect as that of arithmetic operation. One example is https://github.com/dtcxzyw/llvm-opt-benchmark/blob/e3c822bf41df3a88ca38eba884a52b0cc7e70bf2/bench/protobuf/optimized/generated_message_reflection.cc.ll#L39860-L39873 ``` llvm %conv.i188 = zext i32 %145 to i64 %add.i189 = add i64 %conv.i188, %125 %146 = load i16, ptr %num_aux_entries10.i, align 2 %conv2.i191 = zext i16 %146 to i64 %mul.i192 = shl nuw nsw i64 %conv2.i191, 3 %add3.i193 = add i64 %add.i189, %mul.i192 %147 = inttoptr i64 %add3.i193 to ptr %sub.ptr.lhs.cast.i195 = ptrtoint ptr %144 to i64 %sub.ptr.rhs.cast.i196 = ptrtoint ptr %143 to i64 %sub.ptr.sub.i197 = sub i64 %sub.ptr.lhs.cast.i195, %sub.ptr.rhs.cast.i196 %add.ptr = getelementptr inbounds i8, ptr %147, i64 %sub.ptr.sub.i197 %sub.ptr.lhs.cast = ptrtoint ptr %add.ptr to i64 %sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %125 ``` where `%conv.i188` first adds `%125` and then subtracts `%125` (the result is `%sub.ptr.sub`), which can be optimized.
2024-04-30[InstCombine] Fold `trunc nuw/nsw (x xor y) to i1` to `x != y` (#90408)Monad1-0/+6
Fold: ``` llvm define i1 @src(i8 %x, i8 %y) { %xor = xor i8 %x, %y %r = trunc nuw/nsw i8 %xor to i1 ret i1 %r } define i1 @tgt(i8 %x, i8 %y) { %r = icmp ne i8 %x, %y ret i1 %r } ``` Proof: https://alive2.llvm.org/ce/z/dcuHmn
2024-04-25[InstCombine] Extract logic for "emit offset and rewrite gep" (NFC)Nikita Popov1-3/+3
2024-04-18[IR][PatternMatch] Only accept poison in getSplatValue() (#89159)Nikita Popov1-1/+1
In #88217 a large set of matchers was changed to only accept poison values in splats, but not undef values. This is because we now use poison for non-demanded vector elements, and allowing undef can cause correctness issues. This patch covers the remaining matchers by changing the AllowUndef parameter of getSplatValue() to AllowPoison instead. We also carry out corresponding renames in matchers. As a followup, we may want to change the default for things like m_APInt to m_APIntAllowPoison (as this is much less risky when only allowing poison), but this change doesn't do that. There is one caveat here: We have a single place (X86FixupVectorConstants) which does require handling of vector splats with undefs. This is because this works on backend constant pool entries, which currently still use undef instead of poison for non-demanded elements (because SDAG as a whole does not have an explicit poison representation). As it's just the single use, I've open-coded a getSplatValueAllowUndef() helper there, to discourage use in any other places.
2024-04-17[InstCombine] Use `auto *` instead of `auto` in `visitSIToFP`; NFCNoah Goldstein1-1/+1
2024-04-16[InstCombine] Add canonicalization of `sitofp` -> `uitofp nneg`Noah Goldstein1-2/+16
This is essentially the same as #82404 but has the `nneg` flag which allows the backend to reliably undo the transform. Closes #88299
2024-04-11[InstCombine] Infer nsw/nuw for trunc (#87910)Yingwei Zheng1-1/+14
This patch adds support for inferring trunc's nsw/nuw flags.
2024-03-29[InstCombine] Remove the canonicalization of `trunc` to `i1` (#84628)Monad1-11/+18
Remove the canonicalization of `trunc` to `i1` according to the suggestion of https://github.com/llvm/llvm-project/pull/83829#issuecomment-1986801166 https://github.com/llvm/llvm-project/blob/a84e66a92d7b97f68aa3ae7d2c5839f3fb0d291d/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp#L737-L745 Alive2: https://alive2.llvm.org/ce/z/cacYVA
2024-03-20Revert "[InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0`"Noah Goldstein1-5/+1
This reverts commit d80d5b923c6f611590a12543bdb33e0c16044d44. It wasn't a particularly important transform to begin with and caused some codegen regressions on targets that prefer `sitofp` so dropping. Might re-visit along with adding `nneg` flag to `uitofp` so its easily reversable for the backend.
2024-03-19[InstCombine] Fold `fpto{s|u}i non-norm` to zero (#85569)Yingwei Zheng1-0/+19
This patch enables more optimization after canonicalizing `fmul X, 0.0` into a copysign. I decide to implement this fold in InstCombine because `computeKnownFPClass` may be expensive. Alive2: https://alive2.llvm.org/ce/z/ASM8tQ
2024-03-13[InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0`Noah Goldstein1-1/+5
Just a standard canonicalization. Proofs: https://alive2.llvm.org/ce/z/9W4VFm Closes #82404
2024-03-13[InstCombine] Simplify `zext nneg i1 X` to zero (#85043)Yingwei Zheng1-0/+4
Alive2: https://alive2.llvm.org/ce/z/Wm6kCk
2024-03-06 [InstCombine] Fix shift calculation in InstCombineCasts (#84027)Quentin Dian1-2/+2
Fixes #84025.
2024-02-22[InstCombine] Pick bfloat over half when shrinking ops that started with an ↵Benjamin Kramer1-9/+14
fpext from bfloat (#82493) This fixes the case where we would shrink an frem to half and then bitcast to bfloat, producing invalid results. The transformation was written under the assumption that there is only one type with a given bit width. Also add a strategic assert to CastInst::CreateFPCast to turn this miscompilation into a crash.
2024-02-07[PatternMatch] Add a matching helper `m_ElementWiseBitCast`. NFC. (#80764)Yingwei Zheng1-3/+9
This patch introduces a matching helper `m_ElementWiseBitCast`, which is used for matching element-wise int <-> fp casts. The motivation of this patch is to avoid duplicating checks in https://github.com/llvm/llvm-project/pull/80740 and https://github.com/llvm/llvm-project/pull/80414.
2024-01-22[InstCombine] Try to fold trunc(shuffle(zext)) to just a shuffle (#78636)Alexey Bataev1-0/+13
Tries to remove extra trunc/ext instruction for shufflevector instructions. Differential Review: https://github.com/llvm/llvm-project/pull/78636
2024-01-19Revert "[InstCombine] Try to fold trunc(shuffle(zext)) to just a shuffle ↵Pranav Kant1-10/+0
(#78636)" This reverts commit 4d11f04b20f0bd7488e19e8f178ba028412fa519. This breaks some programs as mentioned in #78636
2024-01-19[InstCombine] Try to fold trunc(shuffle(zext)) to just a shuffle (#78636)Alexey Bataev1-0/+10
Tries to remove extra trunc/ext instruction for shufflevector instructions.
2023-11-29[ValueTracking] Convert isKnownNonNegative() to use SimplifyQuery (NFC)Nikita Popov1-2/+2
2023-11-21[InstCombine] Fix incorrect nneg inference on shift amountNikita Popov1-1/+3
Whether this is valid depends on the bit widths of the involved integers. Fixes https://github.com/llvm/llvm-project/issues/72927.