Age | Commit message (Collapse) | Author | Files | Lines |
|
After 9fe78db4, the pass inserts `store volatile i32 -1, ptr %call_site`
before all invoke instruction except the one in the entry block, which
has the effect of bypassing landing pads on exceptions.
When configuring the call site for a potentially throwing instruction
check that it is not `InvokeInst` -- they are handled by earlier code.
|
|
when optimizations are disabled (#117459)
Handle \@llvm.expect.with.probability in SelectionDAGBuilder, FastISel,
and IntrinsicLowering in the same way \@llvm.expect is handled, where
the value is passed through as-is. This can be reached if the intrinsic
is used without optimizations, where it would otherwise be properly
transformed out.
Fixes #115411 for SelectionDAG. A similar patch is likely needed for
GlobalISel.
|
|
With cb57b7a7, MachineLateInstrsCleanup switched to using a map to keep
track of kill flags to remedy compile time regressions seen with huge
functions. It seems that the comment above clearKillsForDef() became stale with
that commit, and also that one of the arguments to it became unused,
both of which this patch fixes.
|
|
LegalizerHelp only supported f128 libcalls and incorrectly assumed that
the destination register for the G_FCMP was s32.
|
|
(#117320)
The optimiser will produce empty blocks that are unconditionally
executed according to the CFG -- while it may not be meaningful code,
and won't get a prologue_end position, we need to not crash on this
input.
The fault comes from assuming that there's always a next block with some
instructions in it, that will eventually produce some meaningful control
flow to stop at -- in the given reproducer in issue #117206 this isn't
true, because the function terminates with `unreachable`. Thus, I've
refactored the "get next instruction logic" into a helper that'll step
through all blocks and terminate if there aren't any more.
Reproducer from aeubanks
|
|
Assert that the passed value is a valid unsigned integer value for the
specified type.
For signed values getSignedConstant() / getSignedTargetConstant() should
be used instead.
|
|
TargetConstant. (#117639)
Fix all the places I could find that did't do this. We were already
mostly correct for FP_ROUND after
9a976f36615dbe15e76c12b22f711b2e597a8e51, but not STRICT_FP_ROUND.
|
|
This looks like a rather weird change, so let me explain why this isn't
as unreasonable as it looks. Let's start with the problem it's solving.
```
define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb:
%i = icmp eq i32 %arg1, 1
br i1 %i, label %bb2, label %bb5
bb2: ; preds = %bb
%i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4
%i4 = load i32, ptr %i3, align 4
br label %bb5
bb5: ; preds = %bb2, %bb
%i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ]
ret i32 %i6
}
```
Right now, we codegen this as:
```
li a3, 1
li a2, 13
bne a1, a3, .LBB0_2
lw a2, 4(a0)
.LBB0_2:
mv a0, a2
ret
```
In this example, we have two values which must be assigned to a0 per the
ABI (%arg, and the return value). SelectionDAG ensures that all values
used in a successor phi are defined before exit the predecessor block.
This creates an ADDI to materialize the immediate in the entry block.
Currently, this ADDI is not sunk into the tail block because we'd have
to split a critical edges to do so. Note that if our immediate was
anything large enough to require two instructions we *would* split this
critical edge.
Looking at other targets, we notice that they don't seem to have this
problem. They perform the sinking, and tail duplication that we don't.
Why? Well, it turns out for AArch64 that this is entirely an accident of
the existance of the gpr32all register class. The immediate is
materialized into the gpr32 class, and then copied into the gpr32all
register class. The existance of that copy puts us right back into the
two instruction case noted above.
This change essentially just bypasses this emergent behavior aspect of
the aarch64 behavior, and implements the same "always sink immediates"
behavior for RISCV as well.
|
|
These macros are created inside a function. They should be undefined
before the end of the function.
|
|
integer promotion of STRICT_FLDEXP in type legalizer. (#117633)
A special case in type legalization wasn't accounting for different
operand numbering between FLDEXP and STRICT_FLDEXP.
AArch64 already asked STRICT_FLDEXP to be promoted, but had no test for
it.
|
|
This update follows up on change #112671 and is mostly a NFC, with the following exceptions:
- Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required.
- Parameter count is now calculated based on the unique hash count.
- Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size.
- Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time.
- Moved a sorting operation outside of the loop.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
|
|
This reverts commit fdf1f69c57ac3667d27c35e097040284edb1f574.
|
|
This update follows up on change #112671 and is mostly a NFC, with the following exceptions:
- Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required.
- Parameter count is now calculated based on the unique hash count.
- Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size.
- Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time.
- Moved a sorting operation outside of the loop.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
|
|
operations (#115745)"
This reverts commit b5a11d378db4b39ceb085ebd59c941e9369d9596.
|
|
(#116031)" (#117556)
This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.
|
|
(#115745)
* Enables conversion of several select-like instructions within one
group
* Any number of auxiliary instructions depending on the same condition
can be in between select-like instructions
* After splitting the basic block, move select-like instructions into
the relevant basic blocks and optimise them
* Make it easier to add support shift-base select-like instructions and
also any mixture of zext/sext/not instructions
|
|
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
|
|
Create signed constant using getSignedConstant(), to avoid future
assertion failures when we disable implicit truncation in getConstant().
This also touches some generic legalization code, which apparently only
AMDGPU tests.
|
|
For the RISC-V target, V14_V15 are not subregisters of v14m4, even
though they share some registers. Currently, the MachineVerifier reports
an error when checking register liveness for segment load/store
operations.
This patch adds additional register liveness checking, using RegUnit
instead of subregisters, to prevent this error.
|
|
Building vp intrinsic functions using a unified interface for
expandPredicationToIntCall/expandPredicationToFPCall/expandPredicationToCastIntrinsic
functions.
|
|
This converts all ptr element shuffle vectors to s64, so that the
existing vector legalization handling can lower them as needed. This
prevents a lot of fallbacks that currently try to generate things like
`<2 x ptr> G_EXT`.
I'm not sure if bitcast/inttoptr/ptrtoint is intended to be necessary
for vectors of pointers, but it uses buildCast for the casts, which now
generates a ptrtoint/inttoptr.
|
|
This PR allows mixing `-basic-block-sections` with
`-enable-machine-function-splitter`. The strategy is to let
`-basic-block-sections` take precedence over functions with profiles.
|
|
Fixes #111950
|
|
Sometimes we want to use a `PgoAnalysisMap` feature that doesn't require
the BB entries info, e.g. only the `FuncEntryCount`, but the BB entries
is emitted by default, so I'm adding an option to skip the info for this
case to save the binary size(can save ~90% size of the section). For
implementation, it extends a new field(`OmitBBEntries`) in
`BBAddrMap::Features` for this and it's controlled by a switch
`--basic-block-address-map-skip-bb-entries`.
Note that this naturally supports backwards compatibility as the field
is zero for the old version, matches the decoding in the new version
llvm.
|
|
(#116191)" (#117367)
To pass tests with #117307 revert.
This reverts commit 3093b29b597b9a936a3e4d1c8bc4a7ccba8fc848.
|
|
(#117365)
Maybe not needed but to avoid conflicts with #117307
Without revert of this one, but reverting #117307, the
regenerated init-undef.mir became empty.
This reverts commit be15fd5085680cc5ed9ec4f4f2258b504cdd55db.
|
|
|
|
Type and register class aren't mutually exclusive in gMIR but there's also
no target-independent requirement (yet?) to have both on target instructions.
|
|
`isVectorIntrinsicWithOverloadTypeAtArg` api (#114849)
This changes allows target intrinsics to specify and overwrite overloaded types.
- Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case
- Updates `SLPVectorizer` to use available TTI
- Updates `VPTransformState` to pass down TTI
- Updates `VPlanRecipe` to use passed-down TTI
This change will let us add scalarization for `asdouble`: #114847
|
|
The improvements in 63917e1 / #70796 do not check for memory
barriers/unmodelled sideeffects, which means we may incorrectly hoist
loads across memory barriers.
Fix this by checking any machine instruction in the loop is a load-fold
barrier.
PR: https://github.com/llvm/llvm-project/pull/116987
|
|
Currently the function will walk the entire DAG to find other candidates
to perform a post-inc store. This leads to very long compilation times
on large functions. Added a MaxSteps limit to avoid this, which is also
aligned to how hasPredecessorHelper is used elsewhere in the code.
|
|
Closes #116767
|
|
avoid copying (#116935)
|
|
Now that `-fbasic-block-sections=list` is enabled for Arm, functions may
be split aross multiple sections, and CFI information must be handled
independently for each section.
On x86, this is handled in `llvm/lib/CodeGen/CFIInstrInserter.cpp`.
However, this pass does not run on Arm, so we must add logic for it
to `llvm/lib/CodeGen/CFIFixup.cpp`.
|
|
|
|
There's nothing that specific to FSINCOS about these; they could be used
for similar nodes in the future.
|
|
De-duplicate the functions getSignedPredicate and getUnsignedPredicate,
nearly identical versions of which were present in CmpInst and ICmpInst,
creating less confusion.
|
|
Fix the comparator in `stable_sort()` to satisfy the strict weak
ordering requirement.
In https://github.com/llvm/llvm-project/pull/115367 this comparator was
changed to use `getCycleDepth()` when `shouldOptimizeForSize()` is true.
However, I mistakenly changed to logic so that we use `LHSFreq <
RHSFreq` if **either** of them are zero. This causes us to fail the last
requirment (https://en.cppreference.com/w/cpp/named_req/Compare).
> if comp(a, b) == true and comp(b, c) == true then comp(a, c) == true
|
|
During RDF graph construction, linkRefUp method links a register ref to
its upward reaching defs until all RegUnits of the ref have been covered
by defs.
However, when a sub-register def covers some, but not all, of the
RegUnits of a previous super-register def, a super-register ref is not
linked to the super-register def.
This can result in certain super register defs being dead code
eliminated.
This patch fixes the cover check for a register ref. A def must be
skipped only when all RegUnits of that def have already been covered by
a previously seen def.
|
|
When GlobalMerge creates a MergedGlobal of statics all initialized to
zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results
in just emitting zeros followed by labels for the aliases. We need to
handle it more like how emitGlobalConstantStruct does by emitting each
global inside the aggregate.
---------
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
|
|
In a situation like the following:
```
undef %2.subreg = INST %1 ; DefMI (rematerializable),
; DefSubIdx = subreg
%3 = COPY %2 ; SrcIdx = DstIdx = 0
.... = SOMEINSTR %3, %2
```
there are no subranges for `%3` because the entire register is copied,
but after rematerialization the subrange of the rematerialized value
must be fixed up with the appropriate subranges for `.subreg`.
(To me this issue seemed a bit similar to the issue fixed by #96839, but
then related to rematerialization)
|
|
be extensible. (#115563)"
This reverts commit 2de78815604e9027efd93cac27c517bf732587d2.
Reverted due to buildbot failure:
unittests/IR/CMakeFiles/IRTests.dir/DroppedVariableStatsIRTest.cpp.o:DroppedVariableStatsIRTest.cpp:function llvm::DroppedVariableStatsIR::runAfterPass(llvm::StringRef, llvm::Any): error: undefined reference to 'llvm::DroppedVariableStatsIR::runOnModule(llvm::Module const*, bool)'
|
|
This reverts commit 6e2b77d4696d4a672635c0ba1ead4824e2158a7d.
Reverting due to buildbot failure:
unittests/IR/CMakeFiles/IRTests.dir/DroppedVariableStatsIRTest.cpp.o:DroppedVariableStatsIRTest.cpp:function llvm::DroppedVariableStatsIR::runAfterPass(llvm::StringRef, llvm::Any): error: undefined reference to 'llvm::DroppedVariableStatsIR::runOnModule(llvm::Module const*, bool)'
|
|
This patch uses the DroppedVariableStats class to add dropped variable
statistics for MIR passes.
|
|
extensible. (#115563)
Move DroppedVariableStats code to its own file and change the class to
have an extensible design so that we can use it to add dropped
statistics to MIR passes and the instruction selector.
|
|
(#115979)
This prevents the assertion from wrongly triggering on invalid LLT's
|
|
We import the llvm.ssub.with.overflow.* Intrinsics, but the Legalizer
also builds them while legalizing other opcodes, see narrowScalarAddSub.
|
|
Add integer promotion support for for VP_LOAD and VP_STORE via legalization of extend
and truncate of each form.
Patch commandeered from: https://reviews.llvm.org/D109377
|
|
https://github.com/llvm/llvm-project/pull/109711 disables
`buildCFGChains()` when `-apply-ext-tsp-for-size` is used to improve
codesize. Tail merging can change the layout and normally requires
`buildCFGChains()` to be called again, but we want to prevent this when
optimizing for codesize. We saw slight size improvement on large
binaries with this change. If `-apply-ext-tsp-for-size` is not used,
this should be a NFC.
|
|
With this, all machine SSA optimization passes are available in the new codegen pipeline.
|