Age | Commit message (Collapse) | Author | Files | Lines |
|
frame offsets (#123872)
The assertion previously did not work correctly because the operand was
being truncated to an `int` prior to comparison.
Change the assertion into a a reported error as suggested in
https://github.com/llvm/llvm-project/pull/101840#issuecomment-2304992425
by @arsenm
Finally, fix the lowering on 64-bit targets so that offsets larger than
32-bit are correctly addressed and add tests for various reported
issues.
|
|
The optimization introduced by #125637 tried to avoid using stacks to
promote bitcast with vector result type. However, it wouldn't be correct
if the input type is vector. This patch limits that optimizations to
only scalar to vector bitcasts.
|
|
|
|
SelectionDAG (#151736)
This works on all cases much like the XOR case above it in SelectionDAG.
|
|
C1)). (#150651)
|
|
Fix #147282 and Follow-up to #148834
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
|
|
This patch adds two DAG combines:
1. vector_interleave(splat, splat, ...) -> {splat,splat,...}
2. concat_vectors(splat, splat, ...) -> wide_splat
where all the input splats are identical. Both of these
together enable us to fold
concat_vectors(vector_interleave(splat, splat, ...))
into a wide splat. Post-legalisation we must only do the
concat_vector combine if the wider type and splat operation
is legal.
For fixed-width vectors the DAG combine only occurs for
interleave factors of 3 or more, however it's not currently
safe to test this for AArch64 since there isn't any lowering
support for fixed-width interleaves. I've only added
fixed-width tests for RISCV.
|
|
|
|
#128400 introduced a use-after-free bug in
`RegAllocBase::cleanupFailedVReg` when removing intervals from regunits.
The issue is from the `InterferenceCache` in `RAGreedy`, which holds
`LiveRange*`. The current `InterferenceCache` APIs make it difficult to
update it, and there isn't a straightforward way to do that.
Since #128400 already mentions it's not clear about the necessity of
removing intervals from regunits, this PR avoids the issue by simply
skipping that step.
Fixes SWDEV-527146.
|
|
Collect the necessary information for constructing the call graph
section, and emit to .callgraph section of the binary.
MD5 hash of the callee_type metadata string is used as the numerical
type id emitted.
Reviewers: ilovepi
Reviewed By: ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/87576
|
|
(#151244)
SDValue::isOperandOf checks the result number in addition to the SDNode*.
SDNode::isOperandOf only checks the SDNode*.
|
|
https://github.com/llvm/llvm-project/pull/114990 allowed more aggressive
tail duplication for computed-gotos in both pre- and post-regalloc tail
duplication.
In some cases, performing tail-duplication too early can lead to worse
results, especially if we duplicate blocks with a number of phi nodes.
This is causing a ~3% performance regression in some workloads using
Python 3.12.
This patch updates TailDup to delay aggressive tail-duplication for
computed gotos to after register allocation.
This means we can keep the non-duplicated version for a bit longer
throughout the backend, which should reduce compile-time as well as
allowing a number of optimizations and simplifications to trigger before
drastically expanding the CFG.
For the case in https://github.com/llvm/llvm-project/issues/106846, I
get the same performance with and without this patch on Skylake.
PR: https://github.com/llvm/llvm-project/pull/150911
|
|
This reduces the amount of boilerplate required when adding a new
field to MIMetadata and reduces the chance of bugs like the
one I fixed in TargetInstrInfo::reassociateOps.
Reviewers: arsenm, nikic
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/133535
|
|
(#151342)
Currently terminatorIsComputedGoto will return for blocks with a
indirect branch terminator and no successor. If there are no successor,
the terminator is likely not a computed goto, return false in that case.
Note that this is currently NFC, as the only use checks it only if there
are successors, but it will be needed in
https://github.com/llvm/llvm-project/pull/150911.
PR: https://github.com/llvm/llvm-project/pull/151342
|
|
|
|
Fix redundant LEA: https://godbolt.org/z/34xEYE818
|
|
Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds
from callee_type metadata attached to indirect call instructions.
Reviewers: nikic, ilovepi
Reviewed By: ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/87575
|
|
LoopValStage is already of int.
|
|
coalescing SUBREG_TO_REG" (#134408)
This tries to reland #123632 (previously reverted by commit
6b1db79887df19bc8e8c946108966aa6021c8b87)
This PR aims to fix coalescing of SUBREG_TO_REG when sub-register
liveness tracking is enabled and this is now the so-manieth
reincarnation of this effort :)
This change is needed in order to enable subreg liveness tracking for
AArch64, because without the implicit-def, Machine Copy Propagation
would remove a 'redundant' copy because it doesn't realise that the
top 32-bits of the register are zeroed, which subsequent instructions
rely on.
Changes compared to previous PR:
* Rather than updating all instructions that define the source register
(SrcReg) of the SUBREG_TO_REG, this new approach only updates
instructions
that define SrcReg when they dominate the SUBREG_TO_REG. The live-ranges
are updated accordingly.
|
|
This flag applies to G_PTR_ADD instructions and indicates that the operation
implements an inbounds getelementptr operation, i.e., the pointer operand is in
bounds wrt. the allocated object it is based on, and the arithmetic does not
change that.
It is set when the IRTranslator lowers inbounds GEPs (currently only in some
cases, to be extended with a future PR), and in the
(build|materialize)ObjectPtrOffset functions.
Inbounds information is useful in ISel when we have instructions that perform
address computations whose intermediate steps must be in the same memory region
as the final result. A follow-up patch will start using it for AMDGPU's flat
memory instructions, where the immediate offset must not affect the memory
aperture of the address.
This is analogous to a concurrent effort in SDAG: #131862
(related: #140017, #141725).
For SWDEV-516125.
|
|
The "at construction" binop folds in SelectionDAG::getNode() has
different behaviour when compared to the equivalent LLVM IR. This PR
makes the behaviour consistent while also extending the coverage to
include signed/unsigned max/min operations.
|
|
Fold sequences where we extract a bunch of contiguous bits from a value,
merge them into the low bit and then check if the low bits are zero or
not.
Usually the and would be on the outside (the leaves) of the expression,
but the DAG canonicalizes it to a single `and` at the root of the
expression.
The reason I put this in DAGCombiner instead of the target combiner is
because this is a generic, valid transform that's also fairly niche, so
there isn't much risk of a combine loop I think.
See #136727
|
|
|
|
hotness prefix (#150859)
Currently, `TargetLoweringObjectFileELF::getSectionForConstant` produce
`.<section>.hot` or `.<section>.unlikely` for a constant with non-empty
section prefix. This PR changes the implementation add trailing dot when
section prefix is not empty, to disambiguate `.hot` as a hotness prefix
from `.hot` as a (pure C) variable name.
Relevant discussions are in
https://github.com/llvm/llvm-project/pull/148985#discussion_r2221141273
and
https://github.com/llvm/llvm-project/pull/148985#discussion_r2233382641
and
|
|
Cygwin and MinGW share the auto import behavior that could result in
__stack_check_guard being non-dso-local. Allow windres to assume a
Cygwin target as well as a MinGW one, so defines like _WIN32 would not
be present on Cygwin.
|
|
Remove AssertZext if the input ensures the assert cannot fail.
|
|
These functions are for building G_PTR_ADDs when we know that the base
pointer and the result are both valid pointers into (or just after) the
same object. They are similar to SelectionDAG::getObjectPtrOffset.
This PR also changes call sites of the generic (build|materialize)PtrAdd
functions that implement pointer arithmetic to split large memory
accesses to the new functions. Since memory accesses have to fit into an
object in memory, pointer arithmetic to an offset into a large memory
access also yields an address in that object.
Currently, these (build|materialize)ObjectPtrOffset functions only add
"nuw" to the generated G_PTR_ADD, but I intend to introduce an
"inbounds" MIFlag in a later PR (analogous to a concurrent effort in
SDAG: #131862, related: #140017, #141725) that will also be set in the
(build|materialize)ObjectPtrOffset functions.
Most test changes just add "nuw" to G_PTR_ADDs. Exceptions are AMDGPU's
call-outgoing-stack-args.ll, flat-scratch.ll, and freeze.ll tests, where
offsets are now folded into scratch instructions, and cases where the
behavior of the check regeneration script changed, resulting, e.g., in
better checks for "nusw G_PTR_ADD" instructions, matched empty lines,
and the use of "CHECK-NEXT" in MIPS tests.
For SWDEV-516125.
|
|
|
|
fbf6271c7da20356d7b34583b3711b4126ca1dbb introduced an assertion failure as
setDebugValueUndef was called on DBG_LABELs, which isn't allowed and doesn't
make sense. Fix by skipping the call for DBG_LABELs and hoisting, in line with
the original behaviour.
|
|
Split out from https://github.com/llvm/llvm-project/pull/150248:
Specify that the argument of lifetime.start/lifetime.end is ignored and
will be removed in the future.
Remove lifetime size handling from SDAG. The size was previously
discarded during isel, so was always ignored for stack coloring anyway.
Where necessary, obtain the size of the full frame index.
|
|
This PR adds a new interface to IRBuilder called CreateVectorInterleave,
which can be used to create vector.interleave intrinsics of factors 2-8.
For convenience I have also moved getInterleaveIntrinsicID and
getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where
it can be used by IRBuilder.
|
|
This is the GlobalISel part to remove `UnsafeFPMath` flag in CodeGen
pipeline.
|
|
This is the following PR of
https://github.com/llvm/llvm-project/pull/136553 which calculate
NoaliasAddrSpace.
This PR carries the info calculated into MIR by adding it into AAMDnodes
|
|
These global flags hinder further improvements like [[RFC] Honor pragmas
with
-ffp-contract=fast](https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast)
and pass concurrency support. Remove them incrementally.
|
|
This is a refinment of #145565 . That PR added support for "Windows
Secure Hot-patching". In this design, functions that are compiled for
hot-patching need to be modified when they access mutable global
variables. The modification is to insert a level of indirection, the
so-called `__ref_*` variables.
Ref variables are supposed to be inserted into the `.rdata` section, not
`.data`. This provides a degree of protection against modification
(accidental or malicious) of ref variables during program execution.
When the Windows hot-patch subsystem loads a module as a hot-patch, it
finds all ref variables and changes the page protections for the pages
containing them to read/write. Then it sets the ref variables to point
to the real variable locations within the base image. Then it changes
page protections back to read-only.
This relies on the variables being placed in the `.rdata` section, not
`.data`.
However, it is still important that the LLVM `GlobalVariable` that is
created for the ref variable be created with `isConstant = false`. This
prevents LLVM from optimizing accesses to the `GlobalVariable`, i.e.
assuming that the variable can never change and thus inlining its value
into expressions that would ordinarily dereference it. That optimization
would defeat the purpose of hot-patching, so `isConstant = false` is
still the correct value for these ref variables.
|
|
This reverts commit 05e08cdb3e576cc0887d1507ebd2f756460c7db7.
Adding the missing -mtriple flags in MIR/X86 test files which caused
these tests to fail which was the reason for reverting the patch.
|
|
|
|
Reapply #140091.
branch-folder hoists common instructions from TBB and FBB into their
pred. Without this patch it achieves this by splicing the instructions from TBB
and deleting the common ones in FBB. That moves the debug locations and debug
instructions from TBB into the pred without modification, which is not
ideal. Debug locations are handled in #140063.
This patch handles debug instructions - in the simplest way possible, which is
to just kill (undef) them. We kill and hoist the ones in FBB as well as TBB
because otherwise the fact there's an assignment on the code path is deleted
(which might lead to a prior location extending further than it should).
There's possibly something we could do to preserve some variable locations in
some cases, but this is the easiest not-incorrect thing to do.
Note I had to replace the constant DBG_VALUEs to use registers in the test- it
turns out setDebugValueUndef doesn't undef constant DBG_VALUEs... which feels
wrong to me, but isn't something I want to touch right now.
---
Fix end-iterator-dereference and add test.
|
|
These float operations were expanded for scalar f32/f64/f128, but not
for f16 and more problematically, not for vectors. A small subset of
them was separately set to expand for vectors.
Change these to always expand by default, and adjust targets to mark
these as legal where necessary instead.
This is a much safer default, and avoids unnecessary legalization
failures because a target failed to manually mark them as expand.
Fixes https://github.com/llvm/llvm-project/issues/110753.
Fixes https://github.com/llvm/llvm-project/issues/121390.
|
|
Those are metadata sections for ELF but was not properly set for COFF.
|
|
getLabelAfterInsn() already returns MCSymbol *.
|
|
I had dropped the check for which intrinsics were supported. This is
a quick fix to get tree back into an unbroken state, a cleaner change
may follow.
|
|
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
|
|
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
|
|
types (#145197)
This is a starting point to have better legalization failure diagnostics
|
|
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
|
|
Closes https://github.com/llvm/llvm-project/issues/150611.
|
|
Reverts llvm/llvm-project#149999
https://lab.llvm.org/buildbot/#/builders/139/builds/17622
|
|
Reapply #140091.
branch-folder hoists common instructions from TBB and FBB into their
pred. Without this patch it achieves this by splicing the instructions from TBB
and deleting the common ones in FBB. That moves the debug locations and debug
instructions from TBB into the pred without modification, which is not
ideal. Debug locations are handled in #140063.
This patch handles debug instructions - in the simplest way possible, which is
to just kill (undef) them. We kill and hoist the ones in FBB as well as TBB
because otherwise the fact there's an assignment on the code path is deleted
(which might lead to a prior location extending further than it should).
There's possibly something we could do to preserve some variable locations in
some cases, but this is the easiest not-incorrect thing to do.
Note I had to replace the constant DBG_VALUEs to use registers in the test- it
turns out setDebugValueUndef doesn't undef constant DBG_VALUEs... which feels
wrong to me, but isn't something I want to touch right now.
|
|
This extends the fixed vector lowering to support the case where the
mask is formed via shufflevector idiom.
---------
Co-authored-by: Luke Lau <luke_lau@icloud.com>
|