Age | Commit message (Collapse) | Author | Files | Lines |
|
These select instructions are created from non-branching instructions,
so their branch weights are unknown.
Tracking issue: #147390
|
|
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.
Co-authored-by: Luke Lau <luke@igalia.com>
|
|
Since FSub X, 0 gets canoncialised to FAdd X, -0 the said optimization
didn't make much sense for FSub. Remove it from IC and the adjoined
testcase.
|
|
(#157757)
When FAdd result is used by fabs, we can safely ignore the sign bit of
fp zero. This patch enables an instruction simplification optimization
that folds fadd x, 0 ==> x, which would otherwise not work as the
compiler cannot prove that the zero isn't -0. But if the result of the
fadd is used by fabs we can simply ignore this and still do the
optimization.
Fixes #154238
|
|
This PR closes #157524.
alive2: https://alive2.llvm.org/ce/z/xe_vb2
godbolt: https://alive2.llvm.org/ce/z/7A8PxK
This fold is invalid for `@llvm.smin.i1` since `smin(-1, 0) == -1`. I
also avoided i1 in general since this uses zext, but it seems like those
checks for width might not be necessary, since other folds get to it
first.
The alive2 proof in #157524 used a select for the fold, but it seems
like `select X < Y, 1, 0` should be canonicalized to `zext X < Y` if the
bit width is correct.
|
|
Transforms using this helper will add up all the offsets, so we should
use intersectForOffsetAdd() instead of plain intersection.
Annoyingly, this requires specially handling the first GEP to avoid
losing flags in that case.
Fixes https://github.com/llvm/llvm-project/issues/157714.
|
|
When using PatternMatch, there is a common problem where we want to both
match something against a pattern, but also capture the
value/instruction for various reasons (e.g. to access flags).
Currently the two ways to do that is to either capture using
m_Value/m_Instruction and do a separate match on the result, or to use
the somewhat awkward `m_CombineAnd(m_XYZ, m_Value(V))` pattern.
This PR introduces to add a variant of `m_Value`/`m_Instruction` which
does both a capture and a match. `m_Value(V, m_XYZ)` is basically
equivalent to `m_CombineAnd(m_XYZ, m_Value(V))`.
I've ported two InstCombine files to this pattern as a sample.
|
|
When converting gep subtraction / comparison to offset subtraction /
comparison, avoid expanding very long multi-use gep chains.
|
|
If C1 is 1 and we're working with a power of two divisor, this will end
up replacing the `and` for the remainder with a multiply and a longer
dependency chain.
Fixes https://github.com/llvm/llvm-project/issues/147176.
|
|
After https://github.com/llvm/llvm-project/pull/146100, the offset may
be generated by a previous call to `EmitGEPOffsets`, causing the nuw
flag on shl to be lost. This patch handles the `shl+ptradd` case as
well. It also fixes a miscompilation in the case of `mul + ptradd`.
Alive2: https://alive2.llvm.org/ce/z/BeaNzE
This patch removes many unnecessary masking operations in Rust programs
with the `ptr_offset_from_unsigned` intrinsic :
https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2538/files
|
|
This follows on from
https://github.com/llvm/llvm-project/pull/144933#issuecomment-2992372627,
and allows us to remove the reverse (fneg (reverse x)) combine.
A separate patch will handle the case for fabs. I haven't checked if we
perform this canonicalization for either unops or binops for vp.reverse
|
|
This canonicalizes fneg/fabs (shuffle X, poison, mask) -> shuffle
(fneg/fabs X), posion, mask
This undoes part of b331a7ebc1e02f9939d1a4a1509e7eb6cdda3d38 and
a8f13dbdeb31be37ee15b5febb7cc2137bbece67, but keeps the binary shuffle
case i.e. shuffle fneg, fneg, mask.
By pulling out the shuffle we bring it inline with the same
canonicalisation we perform on binary ops and intrinsics, which the
original commit acknowledges it goes in the opposite direction.
However nowadays VectorCombine is more powerful and can do more
optimisations when the shuffle is pulled out, so I think we should
revisit this. In particular we get more shuffles folded and can perform
scalarization.
|
|
This pattern can be often met in Flang generated LLVM IR,
for example, for the counts of the loops generated for array
expressions like: `a(x:x+y)` or `a(x+z:x+z)` or their variations.
In order to compute the loop count, Flang needs to subtract
the lower bound of the array slice from the upper bound
of the array slice. To avoid the sign wraps, it sign extends
the original values (that may be of any user data type)
to `i64`.
This peephole is really helpful in CPU2017/548.exchange2,
where we have multiple following statements like this:
```
block(row+1:row+2, 7:9, i7) = block(row+1:row+2, 7:9, i7) - 10
```
While this is just a 2x3 iterations loop nest, LLVM cannot
figure it out, ending up vectorizing the inner loop really
hard (with a vector epilog and scalar remainder). This, in turn,
causes problems for LSR that ends up creating too many loop-carried
values in the loop containing the above statement, which are then
causing too many spills/reloads.
Alive2: https://alive2.llvm.org/ce/z/gLgfYX
Related to #143219.
|
|
(#142869)
Fixes #142497.
Alive2: https://alive2.llvm.org/ce/z/CeaHaH
The contents of this pull request were substantially written using
claude-code. I've reviewed to the best of my ability (it's been years
since I did any compilers work).
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
Co-authored-by: Nikita Popov <github@npopov.com>
|
|
Extract a reusable helper for emitting a sum of multiple GEP
offsets.
|
|
When looking for the common base pointer, support the case where the
type changes because the GEP goes from pointer to vector of pointers.
This was supported prior to #142958.
|
|
Make this available to other parts of InstCombine, to be used for
pointer comparison optimization.
|
|
Fix #93769
alive2: https://alive2.llvm.org/ce/z/MHShQY
|
|
When expanding the offset of a GEP chain via a series of adds, try to
preserve the nsw/nuw flags based on inbounds/nuw. This is a followup to
https://github.com/llvm/llvm-project/pull/142958.
Proof: https://alive2.llvm.org/ce/z/8HiFYY (note that preserving nsw in
the nusw case is not valid)
|
|
Currently OptimizePointerDifference() only handles single GEPs with a
common base, not GEP chains. This patch generalizes the support to
nested GEPs with a common base.
Finding the common base is a bit annoying because we want to stop as
soon as possible and not recurse into common GEP prefixes.
This helps avoids regressions from
https://github.com/llvm/llvm-project/pull/137297.
|
|
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
|
|
1) (#122491)
Transform (add (add A, 1), (sext (icmp ne A, 0))) into call umax(A, 1).
Fixes #121853.
Alive2: https://alive2.llvm.org/ce/z/TweTan
|
|
Closes https://github.com/llvm/llvm-project/issues/121700
|
|
Up to now, the only way to set specified FMF flags in IRBuilder is to
use `FastMathFlagGuard`. It makes the code ugly and hard to maintain.
This patch introduces a helper class `FMFSource` to replace the original
parameter `Instruction *FMFSource` in IRBuilder. To maximize the
compatibility, it accepts an instruction or a specified FMF.
This patch also removes the use of `FastMathFlagGuard` in some simple
cases.
Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au
|
|
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
|
|
C2` (#119316)
if (C1 & C2) == C2 then (X & C1) - (X & C2) --> X & (C1 ^ C2)
Alive2: https://alive2.llvm.org/ce/z/JvQU8w
|
|
Make sure we only access cached nowrap flags.
|
|
If both the geps and the subs are nuw the new sub is also nuw.
Proof: https://alive2.llvm.org/ce/z/mM8UvF
|
|
Alive2: https://alive2.llvm.org/ce/z/oSWe5S
Closes https://github.com/llvm/llvm-project/issues/118155
|
|
This matches m_Select(m_Value(), L, R) or m_Select(m_Value(), R, L).
|
|
The (extended) bit width might not fit into the (non-extended)
type, resulting in an incorrect truncation of the compared value.
Fix this by using m_SpecificInt(), which is both simpler and
handles this correctly.
Fixes the assertion failure reported in:
https://github.com/llvm/llvm-project/pull/114539#issuecomment-2485799395
|
|
(#113933)
This patch re-queue users of phi when one of its incoming add
instructions is updated. If an add instruction is updated, the analysis
results of phis may be improved. Thus we may further fold some users of
this phi node.
See the following case:
```
define i8 @trunc_in_loop_exit_block() {
; CHECK-LABEL: @trunc_in_loop_exit_block(
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ [[IV_NEXT]], [[LOOP_LATCH]] ]
; CHECK-NEXT: [[CMP:%.*]] = icmp samesign ult i32 [[IV]], 100
; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_LATCH]], label [[EXIT:%.*]]
; CHECK: loop.latch:
; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
; CHECK-NEXT: br label [[LOOP]]
; CHECK: exit:
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[PHI]] to i8
; CHECK-NEXT: ret i8 [[TRUNC]]
;
entry:
br label %loop
loop:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
%phi = phi i32 [ 1, %entry ], [ %iv.next, %loop.latch ]
%cmp = icmp ult i32 %iv, 100
br i1 %cmp, label %loop.latch, label %exit
loop.latch:
%iv.next = add i32 %iv, 1
br label %loop
exit:
%trunc = trunc i32 %phi to i8
ret i8 %trunc
}
```
`%iv u< 100` -> infer `nsw/nuw` for `%iv.next = add i32 %iv, 1`
-> `%iv` is non-negative -> infer `samesign` for `%cmp = icmp ult i32
%iv, 100`.
Without re-queuing users of phi nodes, we cannot improve `%cmp` in one
iteration.
Address review comment
https://github.com/llvm/llvm-project/pull/112642#discussion_r1804712271.
This patch also fixes some non-fixpoint issues in tests.
|
|
On a 32-bit target if pointer arithmetic with `addrspace` is used in i64
computation, the missed folding in InstCombine results to suboptimal
performance, unlike same code compiled for 64bit target.
|
|
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
|
|
Added folds:
- `(add (sub X, Y), (sub Z, X))` -> `(sub Z, Y)`
- `(sub (add X, Y), (add X, Z))` -> `(sub Y, Z)`
The fold typically is handled in the `Reassosiate` pass, but it fails
if the inner `sub`/`add` are multi-use. Less importantly, Reassosiate
doesn't propagate flags correctly.
This patch adds the fold explicitly the InstCombine
Proofs: https://alive2.llvm.org/ce/z/p6JyRP
Closes #105866
|
|
This change also covers the fold of `zext(A > B) - zext(A < B)` since it
is already being canonicalized into the aforementioned pattern.
Proof: https://alive2.llvm.org/ce/z/AgnfMn
|
|
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.
However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.
The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.
For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
|
|
This was already done for -B + A, but not for A + -B.
Proof: https://alive2.llvm.org/ce/z/F3V2yZ
|
|
Compile-time improvement:
http://llvm-compile-time-tracker.com/compare.php?from=13996378d81c8fa9a364aeaafd7382abbc1db83a&to=861ffa4ec5f7bde5a194a7715593a1b5359eb581&stat=instructions:u
baseline: 803eaf29267c6aae9162d1a83a4a2ae508b440d3
```
Top 5 improvements:
stockfish/movegen.ll 2541620819 2538599412 -0.12%
minetest/profiler.cpp.ll 431724935 431246500 -0.11%
abc/luckySwap.c.ll 581173720 580581935 -0.10%
abc/kitTruth.c.ll 2521936288 2519445570 -0.10%
abc/extraUtilTruth.c.ll 1216674614 1215495502 -0.10%
Top 5 regressions:
openssl/libcrypto-shlib-sm4.ll 1155054721 1155943201 +0.08%
openssl/libcrypto-lib-sm4.ll 1155054838 1155943063 +0.08%
spike/vsm4r_vv.ll 1296430080 1297039258 +0.05%
spike/vsm4r_vs.ll 1312496906 1313093460 +0.05%
nuttx/lib_rand48.c.ll 126201233 126246692 +0.04%
Overall: -0.02112308%
```
|
|
additional use. (#98533)
We have a general fold for (zext (X +nuw C2)) + C1 --> zext (X + (C2 +
trunc(C1)))
but this fold is disabled if the zext has an additional use.
If the two constants cancel, we can fold the whole expression to
zext(X) without increasing the number of instructions.
|
|
(Op01C + Op1C) (#94586)
This patch simplifies `sdiv` to `udiv` by preserving the `nsw` flag for
`(X | Op01C) + Op1C --> X + (Op01C + Op1C)` if the sum of `Op01C` and
`Op1C` will not overflow, and preserves the `nuw` flag unconditionally.
Alive2 Proofs (provided by @nikic): https://alive2.llvm.org/ce/z/nrdCZT,
https://alive2.llvm.org/ce/z/YnJHnH
|
|
We can convert this to a select based on the `(icmp eq X, C)`, then
constant fold the addition the true arm begin `(add C, (sext/zext 1))`
and the false arm being `(add X, 0)` e.g
- `(select (icmp eq X, C), (add C, (sext/zext 1)), (add X, 0))`.
This is essentially a specialization of the only case that sees to
actually show up from #89020
Closes #93840
|
|
|
|
EmitGEPOffset() may remove the old GEP, so be sure to cache the
inbounds flag beforehand.
|
|
Currently, the OptimizePointerDifference fold does not trigger when
working on the sub of two geps where one of the geps has multiple uses,
to avoid duplicating the offset arithmetic too much.
However, there are cases where performing it would still be
clearly profitable, e.g. test_sub_ptradd_multiuse.
This patch drops the one-use restriction using the same strategy we use
in GEP comparison folds: If there are multiple uses, we rewrite the GEP
to use the expanded offset arithmetic instead (effectively
canonicalizing it into ptradd representation).
Fixes https://github.com/llvm/llvm-project/issues/88231.
|
|
As we folds fabs over select in
https://github.com/llvm/llvm-project/pull/86390, this patch folds fneg
over select to make sure nabs idioms are generated.
Addresses
https://github.com/llvm/llvm-project/pull/86390#discussion_r1568862289.
Alive2 for FMF propagation: https://alive2.llvm.org/ce/z/-h6Vuo
|
|
C2 * C0) + X * C2` (#76285)
Since `DivRemPairPass` runs after `ReassociatePass` in the optimization
pipeline, I decided to do this simplification in `InstCombine`.
Alive2: https://alive2.llvm.org/ce/z/Jgsiqf
Fixes #76128.
|
|
fixes #88348
proof:
https://alive2.llvm.org/ce/z/fJnM7t
test will be added later
---------
Signed-off-by: ZelinMa557 <3388706467@qq.com>
|
|
In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.
This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.
As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.
There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.
|
|
(#88566)
Other passes are unable to reverse this transform if we use dominating
conditions.
Fixes #88239.
|