| Age | Commit message (Collapse) | Author | Files | Lines |
|
Now that the old llvm::identity has moved into IndexedMap.h under a
different name, this patch renames identity_cxx20 to identity. Note
that llvm::identity closely models std::identity from C++20.
|
|
Eventually this should be program state, and not part of TargetLowering
so avoid direct references to the libcall functions in it.
The usage of RuntimeLibcallsInfo here is not good though, particularly
the use through TargetTransformInfo. It would be better if the IR
attributes were directly encoded in the libcall definition (or at least made
consistent elsewhere). The parsing of the attributes should not also be
responsible for doing the libcall recognition, which is the only part pulling in
the dependency.
|
|
This has been dead since 97bfb936af4077e8cb6c75664231f27a9989d563
|
|
Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same
logic as in SDag's expandFMINIMUM_FMAXIMUM.
Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method
and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.
|
|
All 3 implementations are just checking if this has the
windows check function, so merge that as the only implementation.
|
|
I'm not sure if this is the best way forward or not, but we have a lot
of issues with forgetting that shuffle_vectors can be scalar again and
again. (There is another example from the recent known-bits code added
recently). As a scalar-dst shuffle vector is just an extract, and a
scalar-source shuffle vector is just a build vector, this patch makes
scalar shuffle vector illegal and adjusts the irbuilder to create the
correct node as required.
Most targets do this already through lowering or combines. Making scalar
shuffles illegal simplifies gisel as a whole, it just requires that
transforms that create shuffles of new sizes to account for the scalar
shuffle being illegal (mostly IRBuilder and LessElements).
|
|
(#164329)
Add support for Machine IR (MIR) triplet and entity generation in llvm-ir2vec.
This change extends llvm-ir2vec to support Machine IR (MIR) in addition to LLVM IR, enabling the generation of training data for MIR2Vec embeddings. MIR2Vec provides machine-level code embeddings that capture target-specific instruction semantics, complementing the target-independent IR2Vec embeddings.
- Extended llvm-ir2vec to support triplet and entity generation for Machine IR (MIR)
- Added `--mode=mir` option to specify MIR mode (vs LLVM IR mode)
- Implemented MIR triplet generation with Next and Arg relationships
- Added entity mapping generation for MIR vocabulary
- Updated documentation to explain MIR-specific features and usage
(Partially addresses #162200 ; Tracking issue - #141817)
|
|
Adding Matching and Inference Functionality to Propeller. For detailed
information, please refer to the following RFC:
https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238.
This is the second PR, which includes the calculation of basic block
hashes and their emission to the ELF file. It is associated with the
previous PR at https://github.com/llvm/llvm-project/pull/160706.
co-authors: lifengxiang1025
[lifengxiang@kuaishou.com](mailto:lifengxiang@kuaishou.com); zcfh
[wuminghui03@kuaishou.com](mailto:wuminghui03@kuaishou.com)
Co-authored-by: lifengxiang1025 <lifengxiang@kuaishou.com>
Co-authored-by: zcfh <wuminghui03@kuaishou.com>
Co-authored-by: Rahman Lavaee <rahmanl@google.com>
|
|
This patch introduces SDNodeFlags::InBounds, to show that an ISD::PTRADD SDNode
implements an inbounds getelementptr operation (i.e., the pointer operand is in
bounds wrt. an allocated object it is based on, and the arithmetic does not
change that). The flag is set in the DAG construction when lowering inbounds
GEPs.
Inbounds information is useful in the ISel when selecting memory instructions
that perform address computations whose intermediate steps must be in the same
memory region as the final result. Follow-up patches to propagate the flag in
DAGCombines and to use it when lowering AMDGPU's flat memory instructions,
where the immediate offset must not affect the memory aperture of the address
(similar to this GISel patch: #153001), are planned.
This mirrors #150900, which has introduced a similar flag in GlobalISel.
This patch supersedes #131862, which previously attempted to introduce an
SDNodeFlags::InBounds flag. The difference between this PR and #131862 is that
there is now an ISD::PTRADD opcode (PR #140017) and the InBounds flag is only
defined to apply to ISD::PTRADD DAG nodes. It is therefore unambiguous that
in-bounds-ness refers to a memory object into which the left operand of the
PTRADD node points (in contrast to #131862, where InBounds would have applied
to commutative ISD::ADD nodes, so that the semantics would be more difficult to
reason about).
For SWDEV-516125.
|
|
Fixes #142146
Do nullptr check when pass accept `const TargetMachine &` in
constructor, but it is still not exhaustive.
|
|
Add MIR2Vec support to the llvm-ir2vec tool, enabling embedding generation for Machine IR alongside the existing LLVM IR functionality.
(This is an initial integration; Other entity/triplet gen for vocab generation would follow as separate patches)
|
|
Handling opcodes in embedding computation.
- Revamped MIR Vocabulary with four sections - `Opcodes`, `Common Operands`, `Physical Registers`, and `Virtual Registers`
- Operands broadly fall into 3 categories -- the generic MO types that are common across architectures, physical and virtual register classes. We handle these categories separately in MIR2Vec. (Though we have same classes for both physical and virtual registers, their embeddings vary).
|
|
Note that "override" makes "virtual" redundant.
Identified with modernize-use-override.
|
|
We've switched to llvm::identity_cxx20 for SparseMultiSet, so we don't
need llvm::identity in this file.
|
|
`ISD::VSelect` (#164069)
Fixes #150019
|
|
Change function signature to avoid copying the argument at call site.
|
|
Remove `UnsafeFPMath` uses, next is removing the command line option.
|
|
Implement MIR2Vec embedder for generating vector representations of Machine IR instructions, basic blocks, and functions. This patch introduces changes necessary to *embed* machine opcodes. Machine operands would be handled incrementally in the upcoming patches.
|
|
The 'A' atomics extension is composed of two subextensions, 'Zaamo'
which has atomic memory operation instructions, and 'Zalrsc' which has
load-reserve / store-conditional instructions.
For machines where 'Zalrsc' is present, but 'Zaamo' is not, implement
and enable atomics memory operations through pseudo expansion. Updates the
predication and lowering control to be more precise about which 'Zaamo'/'Zalrsc'
feature was truly requisite.
There will be no functional change to subtargets supporting 'A', while
allowing 'Zalrsc' only subtargets to utilize atomics at an increased code
footprint.
|
|
The legacy llvm::identity is not quite the same as std::identity from
C++20. llvm::identity is a template struct with an ::argument_type
member. In contrast, llvm::identity_cxx20 (and std::identity) is a
non-template struct with a templated call operator and no
::argument_type.
This patch modernizes llvm::SparseSet by updating its default
key-extraction functor to llvm::identity_cxx20. A new template
parameter KeyT takes over the role of ::argument_type.
Existing uses of SparseSet are updated for the new template signature.
Most use sites are of the form SparseSet<T>, requiring no update.
|
|
The legacy llvm::identity is not quite the same as std::identity from
C++20. llvm::identity is a template struct with an ::argument_type
member. In contrast, llvm::identity_cxx20 (and std::identity) is a
non-template struct with a templated call operator and no
::argument_type.
This patch modernizes llvm::SparseMultiSet by updating its default
key-extraction functor to llvm::identity_cxx20. A new template
parameter KeyT takes over the role of ::argument_type.
Existing uses of SparseMultiSet are updated for the new template
signature.
|
|
Extract extern variable declaration into a header per
https://discourse.llvm.org/t/rfc-cs-changes-for-standalone-variables/88581
|
|
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.
This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)
It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
|
|
Mutation `changeElementCountTo` now uses `ElementCount`
|
|
Most `SourceMgr` clients don't make use of include files, but those that
do might want to specify the file system to use. This patch enables that
by making it possible to pass a `vfs::FileSystem` instance into
`SourceMgr`.
|
|
This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]]. Note
that this patch adjusts the placement of [[maybe_unused]] to comply
with the C++17 language.
|
|
|
|
This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]] where
we do not need to move the position of [[maybe_unused]] within
declarations.
Notes:
- [[maybe_unused]] is a standard feature of C++17.
- The compiler is far more lenient about the placement of
__attribute__((unused)) than that of [[maybe_unused]].
I'll follow up with another patch to finish up the rest.
|
|
Reland #161355, after fixing up the cross-projects-tests for the wasm
simd intrinsics.
Original commit message:
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.
I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
|
|
Fixes #122624.
Assisted-by: gpt-5-codex
|
|
Reverts llvm/llvm-project#161355
Looks like I've broken some intrinsic code generation.
|
|
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.
I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
|
|
shouldFoldConstantShiftPairToMask (NFC) (#156949)
This should be based on the type and instructions, and only thumb uses
combine level anyway.
|
|
Make .callgraph section's layout efficient in space. Document the layout
of the section.
|
|
Using SlotIndexes::getMBBEndIdx() isn't the correct choice here because
it returns the starting index of the next basic block, causing live-ins
of the next block to be calculated instead of the intended live-outs of
the current block.
Add SlotIndexes::getMBBLastIdx() method to return the last valid
SlotIndex within a basic block, enabling correct live-out calculations.
|
|
Added factory methods for vocabulary creation. This also would fix UB
issue introduced by #161713
|
|
The `cmpxchg` instruction has two memory orders, one for success and one
for failure.
Prior to this patch `LegalityQuery` only exposed a single memory order,
that of the success case. This meant that it was not generally possible
to legalize `cmpxchg` instructions based on their memory orders.
Add a `FailureOrdering` field to `LegalityQuery::MemDesc`; it is only
set for `cmpxchg` instructions, otherwise it is `NotAtomic`. I didn't
rename `Ordering` to `SuccessOrdering` or similar to avoid breaking
changes for out of tree targets.
The new field does not increase `sizeof(MemDesc)`, it falls into
previous padding bits due to alignment, so I'd expect there to be no
performance impact for this change.
Verified no breakage via check-llvm in build with AMDGPU, AArch64, and X86 targets
enabled.
|
|
This is a port of the SDAG DAGCombiner::combineRepeatedFPDivisors
combine that looks for multiple fdiv operations with the same divisor
and converts them to a single reciprocal fdiv and multiple fmuls. It is
currently a fairly faithful port, with some additions to make sure that
the newly created fdiv dominates all new uses. Compared to the SDAG
version it also drops some logic about splat uses which assumes no
vector fdivs and some logic about x/sqrt(x) which does not yet apply to
GISel.
|
|
Refactor MIRVocabulary to improve opcode lookup and add Section enum for better organization. This is useful for embedder lookups (next patches)
(Tracking issue - #141817)
|
|
This PR introduces the initial infrastructure and vocabulary necessary for generating embeddings for MIR (discussed briefly in the earlier IR2Vec RFC - https://discourse.llvm.org/t/rfc-enhancing-mlgo-inlining-with-ir2vec-embeddings). The MIR2Vec embeddings are useful in driving target specific optimizations that work on MIR like register allocation.
(Tracking issue - #141817)
|
|
This is an attempt to simplify the rematerialization logic in
InlineSpiller and SplitKit. I'd earlier done the same for
RegisterCoalescer in 57b673.
The basic idea of this change is that we don't need to check whether an
instruction is rematerializable early. Instead, we can defer the check
to the point where we're actually trying to materialize something. We
also don't need to indirect that query through a VNI key, and can
instead just check the instruction directly at the use site.
|
|
An alternative approach to #149732 , which sorts the DAG before dumping
it. That approach runs a risk of altering the codegen result as we don't
know if any of the downstream DAG users relies on the node ID, which was
updated as part of the sorting.
The new method proposed by this PR does not update the node ID or any of
the DAG's internal states: the newly added
`SelectionDAG::getTopologicallyOrderedNodes` is a const member function
that returns a list of all nodes in their topological order.
|
|
I have no idea why this was here. The only implementation was AMDGPU,
which was essentially repeating the generic logic but for one specific
case.
|
|
Tolerate setting negative values in tablegen, and store them as a
saturated uint8_t value. This will allow naive uses of the copy cost
to directly add it as a cost without considering the degenerate negative
case. The degenerate negative cases are only used in InstrEmitter / DAG
scheduling, so leave the special case processing there. There are also
fixmes about this system already there.
This is the expedient fix for an out of tree target regression
after #160084. Currently targets can set a negative copy cost to mark
copies as "impossible". However essentially all the in-tree uses only
uses this for non-allocatable condition registers. We probably should
replace the InstrEmitter/DAG scheduler uses with a more direct check
for a copyable register but that has test changes.
|
|
For a while we have supported the `-aarch64-stack-hazard-size=<size>`
option, which adds "hazard padding" between GPRs and FPR/ZPRs. However,
there is currently a hole in this mitigation as PPR and FPR/ZPR accesses
to the same area also cause streaming memory hazards (this is noted by
`-pass-remarks-analysis=sme -aarch64-stack-hazard-remark-size=<val>`),
and the current stack layout places PPRs and ZPRs within the same area.
Which looks like:
```
------------------------------------ Higher address
| callee-saved gpr registers |
|---------------------------------- |
| lr,fp (a.k.a. "frame record") |
|-----------------------------------| <- fp(=x29)
| <hazard padding> |
|-----------------------------------|
| callee-saved fp/simd/SVE regs |
|-----------------------------------|
| SVE stack objects |
|-----------------------------------|
| local variables of fixed size |
| <FPR> |
| <hazard padding> |
| <GPR> |
------------------------------------| <- sp
| Lower address
```
With this patch the stack (and hazard padding) is rearranged so that
hazard padding is placed between the PPRs and ZPRs rather than within
the (fixed size) callee-save region. Which looks something like this:
```
------------------------------------ Higher address
| callee-saved gpr registers |
|---------------------------------- |
| lr,fp (a.k.a. "frame record") |
|-----------------------------------| <- fp(=x29)
| callee-saved PPRs |
| PPR stack objects | (These are SVE predicates)
|-----------------------------------|
| <hazard padding> |
|-----------------------------------|
| callee-saved ZPR regs | (These are SVE vectors)
| ZPR stack objects | Note: FPRs are promoted to ZPRs
|-----------------------------------|
| local variables of fixed size |
| <FPR> |
| <hazard padding> |
| <GPR> |
------------------------------------| <- sp
| Lower address
```
This layout is only enabled if:
* SplitSVEObjects are enabled (`-aarch64-split-sve-objects`)
- (This may be enabled by default in a later patch)
* Streaming memory hazards are present
- (`-aarch64-stack-hazard-size=<val>` != 0)
* PPRs and FPRs/ZPRs are on the stack
* There's no stack realignment or variable-sized objects
- This is left as a TODO for now
Additionally, any FPR callee-saves that are present will be promoted to
ZPRs. This is to prevent stack hazards between FPRs and GRPs in the
fixed size callee-save area (which would otherwise require more hazard
padding, or moving the FPR callee-saves).
This layout should resolve the hole in the hazard padding mitigation,
and is not intended change codegen for non-SME code.
|
|
This splits out "ScalablePredicateVector" from the "ScalableVector"
StackID this is primarily to allow easy differentiation between vectors
and predicates (without inspecting instructions).
This new stack ID is not used in many places yet, but will be used in a
later patch to mark stack slots that are known to contain predicates.
Co-authored-by: Kerry McLaughlin <kerry.mclaughlin@arm.com>
|
|
This commit adds the intrinsic `G_FMODF` to GMIR & enables its
translation, legalization and instruction selection in AArch64.
|
|
This is unused. Targets can lower/expand the `PARTIAL_REDUCE_*` ISD
nodes.
|
|
The isNormalValueType = false flag was not set for this pseudo value
type, which caused significant size increases for some classes: the
size of the TargetLoweringBase class to 1.5 MB, because the size of
that class is quadratic in MVT::VALUETYPE_SIZE, and this commit
increased that from 256 to 504.
Reported by: abadams
Fixes: b05101b ("[TableGen, CodeGen, CHERI] Add support for the cPTR wildcard value type.")
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/161313
|
|
subprogram DIEs" (#160786)
This is an attempt to reland
https://github.com/llvm/llvm-project/pull/159104 with the fix for
https://github.com/llvm/llvm-project/issues/160197.
The original patch had the following problem: when an abstract
subprogram DIE is constructed from within
`DwarfDebug::endFunctionImpl()`,
`DwarfDebug::constructAbstractSubprogramScopeDIE()` acknowledges `unit:`
field of DISubprogram. But an abstract subprogram DIE constructed from
`DwarfDebug::beginModule()` was put in the same compile unit to which
global variable referencing the subprogram belonged, regardless of
subprogram's `unit:`.
This is fixed by adding `DwarfDebug::getOrCreateAbstractSubprogramCU()`
used by both`DwarfDebug:: constructAbstractSubprogramScopeDIE()` and
`DwarfCompileUnit::getOrCreateSubprogramDIE()` when abstract subprogram
is queried during the creation of DIEs for globals in
`DwarfDebug::beginModule()`.
The fix and the already-reviewed code from
https://github.com/llvm/llvm-project/pull/159104 are two separate
commits in this PR.
=====
The original commit message follows:
With this change, construction of abstract subprogram DIEs is split in
two stages/functions: creation of DIE (in
DwarfCompileUnit::getOrCreateAbstractSubprogramDIE) and its population
with children (in
DwarfCompileUnit::constructAbstractSubprogramScopeDIE).
With that, abstract subprograms can be created/referenced from
DwarfDebug::beginModule, which should solve the issue with static local
variables DIE creation of inlined functons with optimized-out
definitions. It fixes https://github.com/llvm/llvm-project/issues/29985.
LexicalScopes class now stores mapping from DISubprograms to their
corresponding llvm::Function's. It is supposed to be built before
processing of each function (so, now LexicalScopes class has a method
for "module initialization" alongside the method for "function
initialization"). It is used by DwarfCompileUnit to determine whether a
DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is
invoked.
DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can
create an abstract or a concrete DIE for a subprogram. It accepts
llvm::Function* argument to determine whether a concrete DIE must be
created.
This is a temporary fix for
https://github.com/llvm/llvm-project/issues/29985. Ideally, it will be
fixed by moving global variables and types emission to
DwarfDebug::endModule (https://reviews.llvm.org/D144007,
https://reviews.llvm.org/D144005).
Some code proposed by Ellis Hoag <ellis.sparky.hoag@gmail.com> in
https://github.com/llvm/llvm-project/pull/90523 was taken for this
commit.
|