aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib
AgeCommit message (Collapse)AuthorFilesLines
4 hours[IA] Support vp.load in lowerInterleavedLoad [nfc-ish] (#149174)Philip Reames9-167/+77
This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad - into the existing dedicated routine. This removes the last use of the dedicate lowerInterleavedVPLoad and thus we can remove it. This isn't quite NFC as the main callback has support for the strided load optimization whereas the VPLoad specific version didn't. So this adds the ability to form a strided load for a vp.load deinterleave with one shuffle used.
4 hours[SPIR-V] Map SPIR-V friendly work-item function to built-in variables (#148567)Wenju He1-1/+26
The mapping ensures the function is lowered to SPIR-V built-in variables in SPIR-V. This can fix pre-commit CI fail in https://github.com/intel/llvm/pull/19359 Also add BuiltIn to SPIR-V Builtin function name in __clang_spirv_builtins.h to align with https://github.com/llvm/llvm-project/blob/main/llvm/docs/SPIRVUsage.rst#builtin-variables
4 hoursAMDGPU: Handle av imm pseudo in si-fix-sgpr-copies phi fold (#149263)Matt Arsenault1-0/+1
5 hoursRuntimeLibcalls: Add methods to recognize libcall names (#149001)Matt Arsenault2-23/+67
Also replace the current static DenseMap of preserved symbol names in the Symtab hack with this. That was broken statefulness across compiles, so this at least fixes that. However this is still broken, llvm-as shouldn't really depend on the triple.
5 hours[AMDGPU] More flatGVS gfx1250 patterns (#149410)Stanislav Mekhanoshin1-13/+20
6 hours[Target] Remove unnecessary casts (NFC) (#149342)Kazu Hirata2-3/+2
getFunction().getParent() already returns Module *.
7 hours[DirectX] Fix GEP flattening with 0-indexed GEPs on global variables (#149211)Deric C.1-0/+10
Fixes #149179 The issue is that `Builder.CreateGEP` does not return a GEP Instruction or GEP ContantExpr when the pointer operand is a global variable and all indices are constant zeroes. This PR ensures that a GEP instruction is created if `Builder.CreateGEP` did not return a GEP.
7 hours[DirectX] Add a GEP to scalar load/store on globals and remove incorrect ↵Deric C.1-14/+22
assertion (#149191) Fixes #149180 This PR removes an assertion that triggered on valid IR. It has been replaced with an if statement that returns early if the conditions are not correct. This PR also adds GEPs to scalar loads and stores from/to global variables.
7 hours[AMDGPU] Remove unused VGLOBAL_Real_AllAddr_gfx12. NFC. (#149398)Stanislav Mekhanoshin1-7/+0
7 hours[AMDGPU] add tests for Change FLAT SADDR to VADDR form in moveToVALU. NFC. ↵Stanislav Mekhanoshin1-0/+1
(#149392)
7 hours[Support/BLAKE3] quick fix for Cygwin build (#148635)Tomohiro Kashiwada2-2/+2
BLAKE3 1.8.2 ( imported in d2ad63a193216d008c8161879a59c5f42e0125cc ) fails to build for the Cygwin target. see: https://github.com/BLAKE3-team/BLAKE3/issues/494 As a temporary workaround, add `&& !defined(__CYGWIN__)` to BLAKE3 locally. resolves https://github.com/llvm/llvm-project/issues/148365
8 hours[IR2Vec][NFC] Add helper methods for numeric ID mapping in Vocabulary (#149212)S. VenkataKeerthy1-2/+18
Add helper methods to IR2Vec's Vocabulary class for numeric ID mapping and vocabulary size calculation. These APIs will be useful in triplet generation for `llvm-ir2vec` tool (See #149214). (Tracking issue - #141817)
8 hours[DWARFLinker] Use different addresses to distinguish invalid ↵Peter Rong1-2/+8
DW_AT_LLVM_stmt_sequence offset (#149376) It'd be helpful (especially when `llvm-dwarfdump ... | grep <invalid_address>`) to separate two different invalid reasons for debugging.
8 hours[SelectionDAG] Fix misplaced commas in operand bundle errors (#149331)Fraser Cormack1-7/+5
9 hours[NFC] simplify LowerAllowCheckPass::printPipeline (#149374)Florian Mayer1-13/+6
9 hours[AMDGPU] Add support for `v_tanh_f32` on gfx1250 (#149360)Shilei Tian4-0/+18
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
10 hours[AMDGPU] Add support for `v_cos_bf16` on gfx1250 (#149355)Shilei Tian1-0/+2
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
10 hours[NVPTX] Add PRMT constant folding and cleanup usage of PRMT node (#148906)Alex MacLean3-82/+203
11 hoursMCAssembler: Modify Contents when VarFixups is not emptyFangrui Song1-7/+11
When there is no VarFixup, VarContentStart is zero. `slice(F.VarContentStart - Contents.size(), F.getSize())` might lead to "runtime error: addition of unsigned offset to" in ubsan builds after #148544
12 hours[AArch64][Machine-Combiner] Split gather patterns into neon regs to multiple ↵Jonathan Cohen2-0/+269
vectors (#142941) This changes optimizes gather-like sequences, where we load values separately into lanes of a neon vector. Since each load has serial dependency, when performing multiple i32 loads into a 128 bit vector for example, it is more profitable to load into separate vector registers and zip them. rdar://151851094
14 hours[SCEV] Try to re-use existing LCSSA phis when expanding SCEVAddRecExpr. ↵Florian Hahn1-0/+23
(#147214) If an AddRec is expanded outside a loop with a single exit block, check if any of the (lcssa) phi nodes in the exit block match the AddRec. If that's the case, simply use the existing lcssa phi. This can reduce the number of instruction created for SCEV expansions, mainly for runtime checks generated by the loop vectorizer. Compile-time impact should be mostly neutral https://llvm-compile-time-tracker.com/compare.php?from=48c7a3187f9831304a38df9bdb3b4d5bf6b6b1a2&to=cf9d039a7b0db5d0d912e0e2c01b19c2a653273a&stat=instructions:u PR: https://github.com/llvm/llvm-project/pull/147214
14 hours[RISCV][IA] Rearrange code for readability and ease of merge [nfc]Philip Reames1-33/+33
14 hours[DAGCombiner] Fold vector subtraction if above threshold to `umin` (#148834)Piotr Fusik1-33/+54
This extends #134235 and #135194 to vectors.
14 hours[llvm] Use *Map::try_emplace (NFC) (#149257)Kazu Hirata1-3/+2
- try_emplace(Key) is shorter than insert({Key, nullptr}). - try_emplace performs value initialization without value parameters. - We overwrite values on successful insertion anyway. While we are at it, this patch simplifies the code with structured binding.
14 hours[TargetParser] Remove const from a return type (NFC) (#149255)Kazu Hirata1-6/+6
getHostCPUFeatures constructs and returns a temporary instance of StringMap<bool>. We don't need const on the return type.
14 hours[AMDGPU] Remove an unnecessary cast (NFC) (#149254)Kazu Hirata1-2/+1
getTargetLowering() already returns const SITargetLowering *.
14 hours[AMDGPU] Move common fields out of WaitcntBrackets. NFC. (#148864)Jay Foad1-63/+59
WaitcntBrackets holds per-basic-block information about the state of wait counters. It also held a bunch of fields that are constant throughout a run of the pass. This patch moves them out into the SIInsertWaitcnts class, for better logical separation and to save a tiny bit of memory.
14 hours[RISCV] Teach SelectAddrRegRegScale that ADD is commutable. (#149231)Craig Topper1-8/+19
15 hours[SelectionDAG] Fix copyExtraInfo where new node has entry as operand (#149307)Marco Elver1-1/+8
Add special case handling where a new replacement node has the entry node as an operand i.e. does not depend on any other nodes. This can be observed with the existing X86/pcsections-atomics.ll test case when targeting Haswell, where certain 128-bit atomics are transformed into arch-specific instructions, with some operands having no other dependencies.
15 hours[GlobalISel] Allow expansion of srem by constant in prelegalizer (#148845)jyli01161-16/+29
This patch allows srem by a constant to be expanded more efficiently to avoid the need for expensive sdiv instructions. This is the last part of the patches which fixes #118090
15 hours[AMDGPU] Move class WaitcntBrackets after class SIInsertWaitcnts. NFC.Jay Foad1-234/+236
This is a prerequisite for "[AMDGPU] Move common fields out of WaitcntBrackets. NFC. (#148864)"
16 hours[AMDGPU] Add support for `v_sin_bf16` on gfx1250 (#149241)Shilei Tian1-0/+2
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
16 hours[AMDGPU] Add support for `v_exp_bf16` on gfx1250 (#149229)Shilei Tian1-0/+2
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
16 hours[DA] Check element size when analyzing deps between same instruction (#148813)Ryotaro Kasuga1-8/+6
DependenceAnalysis checks whether the given addresses are divisible by the element size of corresponding load/store instructions. However, this check was only executed when the two instructions (Src and Dst) are different. We must also perform the same check when Src and Dst are the same instruction. Fix the test added in #147715.
16 hours[SLP]Do not consider non-profitable loads slicesAlexey Bataev1-0/+9
If all slices are small and end up with strided or even vectorization states, better to not consider these candidates for the vectorization and try to vectorize the whole bunch as gathered loads. Reviewers: hiraditya, RKSimon, HanKuanChen Reviewed By: RKSimon, HanKuanChen Pull Request: https://github.com/llvm/llvm-project/pull/149209
16 hours[HEXAGON] Add AssertSext in sign-extended mpy (#149061)Abinaya Saravanan1-0/+9
The pattern i32xi32->i64, should be matched to the sign-extended multiply op, instead of explicit sign- extension of the operands followed by non-widening multiply (this takes 4 operations instead of one). Currently, if one of the operands of multiply inside a loop is a constant, the sign-extension of this constant is hoisted out of the loop by LICM pass and this pattern is not matched by the ISEL. This change handles multiply operand with Opcode of the type AssertSext which is seen when the sign-extension is hoisted out-of the loop. Modifies the DetectUseSxtw() to check for this.
17 hours[X86] Fix an unused-variable warnig (NFC)Jie Fu1-2/+1
/llvm-project/llvm/lib/Target/X86/X86CallingConv.cpp:392:12: error: unused variable 'NumRegs' [-Werror,-Wunused-variable] unsigned NumRegs = PendingMembers.size(); ^ 1 error generated.
17 hours[LoongArch] Optimize inserting bitcasted integer element or bitcasting ↵ZhaoQi3-1/+21
extracted fp element (#147043)
18 hours[LLVM][AArch64ExpandPseudo] Preserve undef flags when expanding SVE 1/2/3-op ↵Paul Walker1-6/+12
pseudo instructions. (#149104) Fixes https://github.com/llvm/llvm-project/issues/149034
19 hours[LoongArch] Optimize inserting element to high part of 256bits vector (#146816)ZhaoQi1-3/+2
19 hours[VPlan] Allow cloning of VPWidenRecipe without underlying instr (NFC).Florian Hahn1-4/+6
Update VPWidenRecipe::clone() to use the constructor w/o mandatory Instruction, to facilitate cloning VPWidenRecipe without underlying instructions. Split off from https://github.com/llvm/llvm-project/pull/148239.
19 hours[X86] Align f128 and i128 to 16 bytes when passing on x86-32 (#138092)Trevor Gross3-3/+49
The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. There also isn't a good way to distinguish whether a set of four registers came from an integer or a float. The main test demonstrating this change is `store_perturbed` in `llvm/test/CodeGen/X86/i128-fp128-abi.ll`. Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf Fixes: https://github.com/llvm/llvm-project/issues/77401
19 hours[TII] Do not fold undef copies (#147392)Jeffrey Byrnes1-5/+11
RegallocBase::cleanupFailedVReg hacks up the state of the liveness in order to facilitate producing valid IR. During this process, we may end up producing undef copies. If the destination of these copies is a spill candidate, we will attempt to fold the source register when issuing the spill. The undef of the source is not propagated to storeRegToStackSlot , thus we end up dropping the undef, issuing a spill, and producing an illegal liveness state. This checks for undef copies, and, if found, inserts a kill instead of spill.
20 hours[llvm-objcopy][libObject] Add RISC-V big-endian support (#146913)Djordje Todorovic2-147/+183
Add support for big-endian RISC-V ELF files: - Add riscv32be/riscv64be target architectures to Triple - Support elf32-bigriscv and elf64-bigriscv output targets in llvm-objcopy - Update ELFObjectFile to handle BE RISC-V format strings and architecture detection - Add BE RISC-V support to RelocationResolver - Add tests for new functionality This is a subset of a bigger RISC-V big-endian support patch, containing only the llvm-objcopy and libObject changes. Other changes will be added later.
20 hoursRevert "[GVN][NFC] Use early return in phiTranslateImpl() (#149268)" (#149270)Madhur Amilkanthwar1-9/+6
This reverts commit 1d398a96dc6b58d15d289c71e2d9f229a0ba719b.
21 hours[GVN][NFC] Use early return in phiTranslateImpl() (#149268)Madhur Amilkanthwar1-6/+9
21 hours[LoongArch] Optimize inserting extracted elements (#146018)ZhaoQi3-10/+126
21 hoursMC: Rework .reloc directive and fix the offset when it evaluates to a constantFangrui Song8-160/+56
* Fix `.reloc constant` to mean section_symbol+constant instead of .+constant . The initial .reloc support from MIPS incorrectly interpreted the offset. * Delay the evaluation of the offset expression after MCAssembler::layout, deleting a lot of code working with MCFragment. * Delete many FIXME from https://reviews.llvm.org/D79625 * Some lld/ELF/Arch/LoongArch.cpp relaxation tests rely on .reloc ., R_LARCH_ALIGN generating ALIGN relocations at specific location. Sort the relocations.
21 hours[Coroutines] Always drop lifetime markers after moving allocas to frame ↵Nikita Popov1-0/+7
(#149141) https://github.com/llvm/llvm-project/pull/142551 started always dropping lifetime markers after moving allocas on the frame, as these are not useful on non-allocas but can cause issues. However, this was not done for other ABIs (retcon, retcononce, async) that go through a different code path. We should treat them the same way.
21 hours[AMDGPU][NPM] Fill in addPreSched2 passes (#148112)Vikram Hegde2-0/+7
same as https://github.com/llvm/llvm-project/pull/139516 Co-authored-by : Oke, Akshat <[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>