Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
|
|
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
Move parts of the logic used by Reassociate to OverflowTracking
(mergeFlags & applyFlags) and move the definition to Local.h.
For now it just moves the NUW/NSW handling, as this matches the uses in
LICM. I'll look into the FP math handling separately, as it looks like
there's a difference between Reassociate (takes all flags from I, while
LICM takes the intersection of the flags on both instructions).
PR: https://github.com/llvm/llvm-project/pull/140403
|
|
Reapply "IR: Remove uselist for constantdata (#137313)"
This reverts commit 5936c02c8b9c6d1476f7830517781ce8b6e26e75.
Fix checking uselists of constants in assume bundle queries
|
|
Possibly breaks the build: https://lab.llvm.org/buildbot/#/builders/24/builds/8119
This reverts commit 87f312aad6ede636cd2de5d18f3058bf2caf5651.
|
|
This is a resurrected version of the patch attached to this RFC:
https://discourse.llvm.org/t/rfc-constantdata-should-not-have-use-lists/42606
In this adaptation, there are a few differences. In the original patch, the Use's
use list was replaced with an unsigned* to the reference count in the value. This
version leaves them as null and leaves the ref counting only in Value.
Remove use-lists from instances of ConstantData (which are shared
across modules and have no operands).
To continue supporting most of the use-list API, store a ref-count in
place of the use-list; this is for API like Value::use_empty and
Value::hasNUses. Operations that actually need the use-list -- like
Value::use_begin -- will assert.
This change has three benefits:
1. The compiler output cannot in any way depend on the use-list order
of instances of ConstantData.
2. There's no use-list traffic when adding and removing simple
constants from operand lists (although there is ref-count traffic;
YMMV).
3. It's cheaper to serialize use-lists (since we're no longer
serializing the use-list order of things like i32 0).
The downside is that you can't look at all the users of ConstantData,
but traversals of users of i32 0 are already ill-advised.
Possible follow-ups:
- Track if an instance of a ConstantVector/ConstantArray/etc. is known
to have all ConstantData arguments, and drop the use-lists to
ref-counts in those cases. Callers need to check Value::hasUseList
before iterating through the use-list.
- Remove even the ref-counts. I'm not sure they have any benefit
besides minimizing the scope of this commit, and maintaining the
counts is not free.
Fixes #58629
Co-authored-by: Duncan P. N. Exon Smith <dexonsmith@apple.com>
|
|
|
|
vectors. (#137340)
This has the side effect of fixing the FIXME for when
use-constant-int-for-fixed-length-splat becomes the default.
|
|
When ranking operands for an expression tree the reassociate pass also
perform canonicalization, putting constants on the right hand side. Such
transforms was however not registered as modifying the IR. So at the end
of the pass, if not having made any other changes, the pass returned
that all analyses should be kept.
With this patch we make sure to set MadeChange to true when modifying
the IR via canonicalizeOperands. This is to make sure analyses such as
DemandedBits are properly invalidated when instructions are modified.
|
|
As part of reassociating add instructions, we may factorize some of the
adds and produce a mul instruction; this patch propagates the source
location of the reassociated tree of instructions to the new mul.
Found using https://github.com/llvm/llvm-project/pull/107279.
|
|
As part of RemoveFactorFromExpression, we attempt to remove a factor
from a mul/fmul expression; this may involve generating new
instructions, e.g. to negate the result if the factor was negative in
the original expression. When this happens, the new instructions should
have a DebugLoc set from the instruction that the factored expression is
being used to compute.
Found using https://github.com/llvm/llvm-project/pull/107279.
|
|
Currently in Reassociate we may create a set of new instructions when
optimizing an `add`, but we do not set DebugLocs on the new
instructions; this patch propagates the add's DebugLoc to the new
instructions.
Found using #107279.
|
|
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
|
|
underlying string data (NFC) (#128269)
I noticed this when looking at all allocations by clang. For a medium
sized file this was around 6000 calls to operator new, although i
suspect there were more allocations in total as the SmallVectors in
DataLayout may have their own allocations in some cases.
In a follow-up i'm tempted to make the DataLayout copy constructor
private, to avoid this in future. There are a few tests which copy the
DataLayout, and perhaps need to (I didn't check yet), but we could
provide a clone() method for them if needed. Its only accidental copying
I think we should consider avoiding, not people who really do need to
copy it for reasons.
|
|
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
|
|
Extends what we already do for i1 types and don't serialize vXi1 logical expressions to improve ILP.
llvm-test-suite numbers
https://github.com/llvm/llvm-project/issues/64840#issuecomment-2053621740
indicate that both reassociations are a net win.
Fixes #64840
Fixes #63946
|
|
In NegateValue in Reassociate, we return the negation of an existing
value in order to break a subtract into an negate + add, potentially
creating a new instruction to perform the negation, but we neglect to
propagate the DebugLoc of the sub being replaced to the negate
instruction if one is created. This patch adds that propagation.
Found using https://github.com/llvm/llvm-project/pull/107279.
|
|
|
|
|
|
Basically the same rules as `add` but we also need to ensure all
operands a non-zero.
Proofs: https://alive2.llvm.org/ce/z/jzsYht
Closes #97040
|
|
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
|
|
These will be replaced later.
|
|
Use the constant folding API instead.
|
|
Fix #95343 .
|
|
|
|
This patch relands #91469 and uses `uint64_t` for repeat count to avoid
a miscompilation caused by overflow
https://github.com/llvm/llvm-project/pull/91469#discussion_r1623925158.
|
|
(#94210)
Reverts
https://github.com/llvm/llvm-project/commit/3bcccb6af685c3132a9ee578b9e11b2503c35a5c
and
https://github.com/llvm/llvm-project/commit/9a282724a29899e84adc91bdeaf639010408a80d
because #91469 causes a miscompilation
https://github.com/llvm/llvm-project/pull/91469#discussion_r1623925158.
|
|
See the following case: https://alive2.llvm.org/ce/z/A-fBki
```
define i3 @src(i3 %0) {
%2 = mul i3 %0, %0
%3 = mul i3 %2, %0
%4 = mul i3 %3, %0
%5 = mul nsw i3 %4, %0
ret i3 %5
}
define i3 @tgt(i3 %0) {
%2 = mul i3 %0, %0
%5 = mul nsw i3 %2, %0
ret i3 %5
}
```
https://github.com/llvm/llvm-project/commit/d7aeefebd6b049f017711cd7c6ef5f217a17b673
introduced weight reduction during weights combination of the same
operand. As the weight of `%0` changes from 5 to 3, the nsw flag in `%5`
should be dropped.
However, the nsw flag isn't cleared by `RewriteExprTree` since `%5 = mul
nsw i3 %0, %4` is not included in the range of `[ExpressionChangedStart,
ExpressionChangedEnd)`.
```
Calculated Rank[] = 3
Combine negations for: %2 = mul i3 %0, %0
Calculated Rank[] = 4
Combine negations for: %3 = mul i3 %0, %2
Calculated Rank[] = 5
Combine negations for: %4 = mul i3 %0, %3
Calculated Rank[] = 6
Combine negations for: %5 = mul nsw i3 %0, %4
LINEARIZE: %5 = mul nsw i3 %0, %4
OPERAND: i3 %0 (1)
ADD USES LEAF: i3 %0 (1)
OPERAND: %4 = mul i3 %0, %3 (1)
DIRECT ADD: %4 = mul i3 %0, %3 (1)
OPERAND: i3 %0 (1)
OPERAND: %3 = mul i3 %0, %2 (1)
DIRECT ADD: %3 = mul i3 %0, %2 (1)
OPERAND: i3 %0 (1)
OPERAND: %2 = mul i3 %0, %0 (1)
DIRECT ADD: %2 = mul i3 %0, %0 (1)
OPERAND: i3 %0 (1)
OPERAND: i3 %0 (1)
RAIn: mul i3 [ %0, #3] [ %0, #3] [ %0, #3]
RAOut: mul i3 [ %0, #3] [ %0, #3] [ %0, #3]
RAOut after CSE reorder: mul i3 [ %0, #3] [ %0, #3] [ %0, #3]
RA: %5 = mul nsw i3 %0, %4
TO: %5 = mul nsw i3 %4, %0
RA: %4 = mul i3 %0, %3
TO: %4 = mul i3 %0, %0
```
The best way to fix this is to inform `RewriteExprTree` to clear flags
of the whole expr tree when weight reduction happens.
But I find that weight reduction based on Carmichael number never
happens in practice.
See the coverage result
https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/home/dtcxzyw/llvm-project/llvm/lib/Transforms/Scalar/Reassociate.cpp.html#L323
I think it would be better to drop `IncorporateWeight`.
Fixes #91417
|
|
We can guarantee NSW on all operands in a reassociated add expression
tree when:
- All adds in an add operator tree are NSW, AND either
- All add operands are guaranteed to be nonnegative, OR
- All adds are also NUW
- Alive2:
- Nonnegative Operands
- 3 operands: https://alive2.llvm.org/ce/z/G4XW6Q
- 4 operands: https://alive2.llvm.org/ce/z/FWcZ6D
- NUW NSW adds: https://alive2.llvm.org/ce/z/vRUxeC
---------
Co-authored-by: Nikita Popov <github@npopov.com>
|
|
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
|
|
Removing debug-intrinsics requires that we always insert with an
iterator, not with an instruction position. To enforce that, we need to
eliminate the `Instruction *` taking functions. It's safe to leave the
insert-at-end-of-block functions as the intention is clear for debug
info purposes (i.e., insert after both instructions and debug-info at
the end of the function).
This patch demonstrates how that needs to happen. At a variety of
call-sites to the `CreateNeg` constructor we need to consider:
* Has this instruction been selected because of the operation it
performs? In that case, just call `getIterator` and pass an iterator in.
* Has this instruction been selected because of it's position? If so, we
need to keep the iterator identifying that position (see the 3rd hunk
changing Reassociate.cpp, although it's coincidentally not debug-info
significant).
This also demonstrates what we'll try and do with the constructor
methods going forwards: have one fully explicit set of parameters
including iterator, and another with default-arguments where the
block-to-insert-into argument defaults to nullptr / no-position,
creating an instruction that hasn't been inserted yet.
|
|
Alive2: https://alive2.llvm.org/ce/z/38KiC_
|
|
Original message:
We still have to keep the noCommonBitsSet call to handle multiple reassociations in one pass. We'll lose the flag on the first reassociation.
|
|
This reverts commit 78964457cf1bafe57a54629fafbd081452a9e528.
Looks like I didn't rebase this correctly before commit
|
|
We still have to keep the noCommonBitsSet call to handle multiple reassociations in one pass. We'll lose the flag on the first reassociation.
|
|
There is support for intrinsics in Instruction::isCommunative, but there
is no equivalent implementation for isAssociative. This patch builds
support for associative intrinsics with TRE as an application. TRE can
now have associative intrinsics as an accumulator. For example:
```
struct Node {
Node *next;
unsigned val;
}
unsigned maxval(struct Node *n) {
if (!n) return 0;
return std::max(n->val, maxval(n->next));
}
```
Can be transformed into:
```
unsigned maxval(struct Node *n) {
struct Node *head = n;
unsigned max = 0; // Identity of unsigned std::max
while (true) {
if (!head) return max;
max = std::max(max, head->val);
head = head->next;
}
return max;
}
```
This example results in about 5x speedup in local runs.
We conservatively only consider min/max and as associative for this
patch to limit testing scope. There are probably other intrinsics that
could be considered associative. There are a few consumers of
isAssociative() that could be impacted. Testing has only required to
Reassociate pass be updated.
|
|
Part of the "RemoveDIs" project to remove debug intrinsics requires
passing block-positions around in iterators rather than as instruction
pointers, allowing some debug-info to reside in BasicBlock::iterator.
This means getInsertionPointAfterDef has to return an iterator, and as
it can return no-instruction that means returning an optional iterator.
This patch changes the signature for getInsertionPtAfterDef and then
patches up the various places that use it to handle the different type.
This would overall be an NFC patch, however in
InstCombinerImpl::freezeOtherUses I've started skipping any debug
intrinsics at the returned insert-position. This should not have any
_meaningful_ effect on the compiler output: at worst it means variable
assignments that are skipped will now cover the freeze instruction and
anything inserted before it, which should be inconsequential.
Sadly: this makes the function signature ugly. This is probably the
ugliest piece of fallout for the "RemoveDIs" work, but it serves the
overall purpose of improving compile times and not allowing `-g` to
affect compiler output, so should be worthwhile in the end.
|
|
Pass SimplifyQuery instead of unpacked list of arguments.
|
|
Reassociation destroys nsw/nuw flags from BinOps that are changed. But if the
expression at the end of a tree that was altered, but didn't change itself,
the flags do not need to be removed. For example, if %a, %b and %c are
reassociated in
%x = add nsw i32 %a, %c
%y = add nsw i32 %x, %b
%z = add nsw i32 %y, %d
The value of %y and so add %y %d remains the same, and %z needn't drop the nsw
flags.
https://alive2.llvm.org/ce/z/_juAiV
Differential Revision: https://reviews.llvm.org/D154289
|
|
# TL;DR #
This patch constrains how much freedom the heuristic that tries to from CSE
expressions has. The added constrain is that the CSE-able expressions must be
within the same basic block as the expressions they get moved before.
# Details #
The reassociation pass currently tweaks the rewrite of the final expression
towards surfacing pairs of operands that would be CSE-able.
This heuristic applies after the regular ordering of the expression.
The regular ordering uses the program structure to choose in which order each
subexpression is materialized. That order follows the topological order.
Now, to expose more CSE opportunities, this heurisitc effectively bypasses the
previous ordering normally defined by the program and pushes up sub-expressions
that are arbitrary deep in the CFG.
E.g., let's say the program order (top to bottom) gives `((a*b)*c)*d)*e` and
`b*e` appears the most in the program. The expression will be reordered in
`(((b*e)*a)*c)*d`
This reordering implies that all the sub expressions (in this example `xx*a`,
then `yy*c`, etc.) will need to appear after the CSE-able expression.
This may over-constrain where the (sub) expressions may live and in particular
it may create loop-dependent expressions.
This patch only allows to move expressions up the expression chain when the
related values are definied in the same basic block as the ones they
"push-down".
This constrain is far for being perfect but at least it avoids accidentally
creating loop dependent variables.
If we really want to expose CSE-able expressions in a proper way, we would need
a profitability metric and also make the decision globally as opposed to one
chain at a time.
I've put the new constrain behind an option to make comparing the old and new
versions easy. However, I believe that even if we find cases where the old
version performs better it is probably by accident. What I am aiming for with
this change is more predictability, then we can improve if need be.
This fixes www.llvm.org/PR61458
Differential Revision: https://reviews.llvm.org/D147457
|
|
|
|
|
|
As shown in the examples in issue #57683, we allow matching
vectors with poison (undef) in this transform (and possibly more),
but we can't then use the partially defined value as a replacement
value in other expressions blindly.
This seems to be avoided in simpler examples of reassociation,
and other passes should be able to clean up the redundant op
seen in these tests.
|
|
Use ConstantFoldUnaryOpOperand() instead. Also make the code below
robust against non-instruction users, just in case it doesn't fold.
|
|
This simplifies the code and fixes handling for the callbr case,
where the instruction needs to be inserted in the normal
destination, rather than after the terminator.
Originally part of D129660.
|
|
|
|
Identified with readability-qualified-auto.
|
|
In D129523, it was noted that there is are some questionable naked casts
from Instruction to BinaryOperator, which could be addressed by doing a
dyn_cast directly to BinaryOperator, avoiding the need for the later cast.
This cleans up that casting.
Reviewed By: nikic, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D130448
|
|
In D129523, it was noted that the approach to check whether a value can
have FastMathFlags was done in different ways, and they should be made
consistent. This patch makes minor changes to fix that.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D130408
|
|
Compiling with '-ffast-math' tuns on all the FastMathFlags (FMF), as
expected, and that enables FP reassociation. Only the two FMF flags
'reassoc' and 'nsz' are technically required to perform reassociation,
but disabling other unrelated FMF bits is needlessly suppressing the
optimization.
This patch fixes that needless suppression, and makes appropriate
adjustments to test-cases, fixing some outstanding TODOs in the process.
Fixes: #56483
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D129523
|