Age | Commit message (Collapse) | Author | Files | Lines |
|
This continues in the direction started by commit 4b81dc7. We
essentially merges the handling for VPLoad - currently in
lowerInterleavedVPLoad - into the existing dedicated routine. This
removes the last use of the dedicate lowerInterleavedVPLoad and thus we
can remove it.
This isn't quite NFC as the main callback has support for the strided
load optimization whereas the VPLoad specific version didn't. So this
adds the ability to form a strided load for a vp.load deinterleave with
one shuffle used.
|
|
The mapping ensures the function is lowered to SPIR-V built-in variables
in SPIR-V. This can fix pre-commit CI fail in https://github.com/intel/llvm/pull/19359
Also add BuiltIn to SPIR-V Builtin function name in __clang_spirv_builtins.h to align with
https://github.com/llvm/llvm-project/blob/main/llvm/docs/SPIRVUsage.rst#builtin-variables
|
|
|
|
Also replace the current static DenseMap of preserved symbol
names in the Symtab hack with this. That was broken statefulness
across compiles, so this at least fixes that. However this is
still broken, llvm-as shouldn't really depend on the triple.
|
|
|
|
getFunction().getParent() already returns Module *.
|
|
Fixes #149179
The issue is that `Builder.CreateGEP` does not return a GEP Instruction
or GEP ContantExpr when the pointer operand is a global variable and all
indices are constant zeroes.
This PR ensures that a GEP instruction is created if `Builder.CreateGEP`
did not return a GEP.
|
|
assertion (#149191)
Fixes #149180
This PR removes an assertion that triggered on valid IR. It has been
replaced with an if statement that returns early if the conditions are
not correct.
This PR also adds GEPs to scalar loads and stores from/to global
variables.
|
|
|
|
(#149392)
|
|
BLAKE3 1.8.2 ( imported in d2ad63a193216d008c8161879a59c5f42e0125cc )
fails to build for the Cygwin target.
see: https://github.com/BLAKE3-team/BLAKE3/issues/494
As a temporary workaround, add `&& !defined(__CYGWIN__)` to BLAKE3
locally.
resolves https://github.com/llvm/llvm-project/issues/148365
|
|
Add helper methods to IR2Vec's Vocabulary class for numeric ID mapping and vocabulary size calculation. These APIs will be useful in triplet generation for `llvm-ir2vec` tool (See #149214).
(Tracking issue - #141817)
|
|
DW_AT_LLVM_stmt_sequence offset (#149376)
It'd be helpful (especially when `llvm-dwarfdump ... | grep
<invalid_address>`) to separate two different invalid reasons for
debugging.
|
|
|
|
|
|
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
|
|
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
|
|
|
|
When there is no VarFixup, VarContentStart is zero.
`slice(F.VarContentStart - Contents.size(), F.getSize())`
might lead to "runtime error: addition of unsigned offset to" in ubsan builds after #148544
|
|
vectors (#142941)
This changes optimizes gather-like sequences, where we load values
separately into lanes of a neon vector. Since each load has serial
dependency, when performing multiple i32 loads into a 128 bit vector for example, it
is more profitable to load into separate vector registers and zip them.
rdar://151851094
|
|
(#147214)
If an AddRec is expanded outside a loop with a single exit block, check
if any of the (lcssa) phi nodes in the exit block match the AddRec. If
that's the case, simply use the existing lcssa phi.
This can reduce the number of instruction created for SCEV expansions,
mainly for runtime checks generated by the loop vectorizer.
Compile-time impact should be mostly neutral
https://llvm-compile-time-tracker.com/compare.php?from=48c7a3187f9831304a38df9bdb3b4d5bf6b6b1a2&to=cf9d039a7b0db5d0d912e0e2c01b19c2a653273a&stat=instructions:u
PR: https://github.com/llvm/llvm-project/pull/147214
|
|
|
|
This extends #134235 and #135194 to vectors.
|
|
- try_emplace(Key) is shorter than insert({Key, nullptr}).
- try_emplace performs value initialization without value parameters.
- We overwrite values on successful insertion anyway.
While we are at it, this patch simplifies the code with structured
binding.
|
|
getHostCPUFeatures constructs and returns a temporary instance of
StringMap<bool>. We don't need const on the return type.
|
|
getTargetLowering() already returns const SITargetLowering *.
|
|
WaitcntBrackets holds per-basic-block information about the state of
wait counters. It also held a bunch of fields that are constant
throughout a run of the pass. This patch moves them out into the
SIInsertWaitcnts class, for better logical separation and to save a tiny
bit of memory.
|
|
|
|
Add special case handling where a new replacement node has the entry
node as an operand i.e. does not depend on any other nodes.
This can be observed with the existing X86/pcsections-atomics.ll test
case when targeting Haswell, where certain 128-bit atomics are
transformed into arch-specific instructions, with some operands having
no other dependencies.
|
|
This patch allows srem by a constant to be expanded more efficiently to
avoid the need for expensive sdiv instructions. This is the last part of
the patches which fixes #118090
|
|
This is a prerequisite for "[AMDGPU] Move common fields out of WaitcntBrackets. NFC. (#148864)"
|
|
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
|
|
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
|
|
DependenceAnalysis checks whether the given addresses are divisible by
the element size of corresponding load/store instructions. However, this
check was only executed when the two instructions (Src and Dst) are
different. We must also perform the same check when Src and Dst are the
same instruction.
Fix the test added in #147715.
|
|
If all slices are small and end up with strided or even vectorization
states, better to not consider these candidates for the vectorization
and try to vectorize the whole bunch as gathered loads.
Reviewers: hiraditya, RKSimon, HanKuanChen
Reviewed By: RKSimon, HanKuanChen
Pull Request: https://github.com/llvm/llvm-project/pull/149209
|
|
The pattern i32xi32->i64, should be matched to the sign-extended
multiply op, instead of explicit sign- extension of the operands
followed by non-widening multiply (this takes 4 operations instead of
one). Currently, if one of the operands of multiply inside a loop is a
constant, the sign-extension of this constant is hoisted out of the loop
by LICM pass and this pattern is not matched by the ISEL.
This change handles multiply operand with Opcode of the type AssertSext
which is seen when the sign-extension is hoisted out-of the loop.
Modifies the DetectUseSxtw() to check for this.
|
|
/llvm-project/llvm/lib/Target/X86/X86CallingConv.cpp:392:12:
error: unused variable 'NumRegs' [-Werror,-Wunused-variable]
unsigned NumRegs = PendingMembers.size();
^
1 error generated.
|
|
extracted fp element (#147043)
|
|
pseudo instructions. (#149104)
Fixes https://github.com/llvm/llvm-project/issues/149034
|
|
|
|
Update VPWidenRecipe::clone() to use the constructor w/o mandatory
Instruction, to facilitate cloning VPWidenRecipe without underlying
instructions.
Split off from https://github.com/llvm/llvm-project/pull/148239.
|
|
The i386 psABI specifies that `__float128` has 16 byte alignment and
must be passed on the stack; however, LLVM currently stores it in a
stack slot that has an offset of 4. Add a custom lowering to correct
this alignment to 16-byte.
i386 does not specify an `__int128`, but it seems reasonable to keep the
same behavior as `__float128` so this is changed as well. There also
isn't a good way to distinguish whether a set of four registers came
from an integer or a float.
The main test demonstrating this change is `store_perturbed` in
`llvm/test/CodeGen/X86/i128-fp128-abi.ll`.
Referenced ABI:
https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf
Fixes: https://github.com/llvm/llvm-project/issues/77401
|
|
RegallocBase::cleanupFailedVReg hacks up the state of the liveness in
order to facilitate producing valid IR. During this process, we may end
up producing undef copies.
If the destination of these copies is a spill candidate, we will attempt
to fold the source register when issuing the spill. The undef of the
source is not propagated to storeRegToStackSlot , thus we end up
dropping the undef, issuing a spill, and producing an illegal liveness
state.
This checks for undef copies, and, if found, inserts a kill instead of
spill.
|
|
Add support for big-endian RISC-V ELF files:
- Add riscv32be/riscv64be target architectures to Triple
- Support elf32-bigriscv and elf64-bigriscv output targets in
llvm-objcopy
- Update ELFObjectFile to handle BE RISC-V format strings and
architecture detection
- Add BE RISC-V support to RelocationResolver
- Add tests for new functionality
This is a subset of a bigger RISC-V big-endian support patch, containing
only the llvm-objcopy and libObject changes. Other changes will be added
later.
|
|
This reverts commit 1d398a96dc6b58d15d289c71e2d9f229a0ba719b.
|
|
|
|
|
|
* Fix `.reloc constant` to mean section_symbol+constant instead of
.+constant . The initial .reloc support from MIPS incorrectly
interpreted the offset.
* Delay the evaluation of the offset expression after
MCAssembler::layout, deleting a lot of code working with MCFragment.
* Delete many FIXME from https://reviews.llvm.org/D79625
* Some lld/ELF/Arch/LoongArch.cpp relaxation tests rely on .reloc .,
R_LARCH_ALIGN generating ALIGN relocations at specific location.
Sort the relocations.
|
|
(#149141)
https://github.com/llvm/llvm-project/pull/142551 started always dropping
lifetime markers after moving allocas on the frame, as these are not
useful on non-allocas but can cause issues.
However, this was not done for other ABIs (retcon, retcononce, async)
that go through a different code path. We should treat them the same
way.
|
|
same as https://github.com/llvm/llvm-project/pull/139516
Co-authored-by : Oke, Akshat
<[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
|