aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib
AgeCommit message (Collapse)AuthorFilesLines
2025-09-19Revert "[ELF][LLDB] Add an nvsass triple (#159459)" (#159879)Joseph Huber4-19/+3
Summary: This patch has broken the `libc` build bot. I could work around that but the changes seem unnecessary. This reverts commit 9ba844eb3a21d461c3adc7add7691a076c6992fc.
2025-09-20X86: Elide use of RegClassByHwMode in some ptr_rc_tailcall uses (#159874)Matt Arsenault2-4/+4
Different instructions are used for the 32-bit and 64-bit cases anyway, so directly use the concrete register class in the instruction.
2025-09-20[M68k] Remove STI from M68kAsmParser (#159827)Sergei Barannikov1-3/+2
STI exists in the base class, use it instead. Fixes #159862.
2025-09-19Reland [BasicBlockUtils] Handle funclets when detaching EH pad blocks (#159379)Gábor Spaits1-28/+69
Fixes #148052 . Last PR did not account for the scenario, when more than one instruction used the `catchpad` label. In that case I have deleted uses, which were already "choosen to be iterated over" by the early increment iterator. This issue was not visible in normal release build on x86, but luckily later on the address sanitizer build it has found it on the buildbot. Here is the diff from the last version of this PR: #158435 ```diff diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp index 91e245e5e8f5..1dd8cb4ee584 100644 --- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp +++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp @@ -106,7 +106,8 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs, // first block, the we would have possible cleanupret and catchret // instructions with poison arguments, which wouldn't be valid. if (isa<FuncletPadInst>(I)) { - for (User *User : make_early_inc_range(I.users())) { + SmallPtrSet<BasicBlock *, 4> UniqueEHRetBlocksToDelete; + for (User *User : I.users()) { Instruction *ReturnInstr = dyn_cast<Instruction>(User); // If we have a cleanupret or catchret block, replace it with just an // unreachable. The other alternative, that may use a catchpad is a @@ -114,33 +115,12 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs, if (isa<CatchReturnInst>(ReturnInstr) || isa<CleanupReturnInst>(ReturnInstr)) { BasicBlock *ReturnInstrBB = ReturnInstr->getParent(); - // This catchret or catchpad basic block is detached now. Let the - // successors know it. - // This basic block also may have some predecessors too. For - // example the following LLVM-IR is valid: - // - // [cleanuppad_block] - // | - // [regular_block] - // | - // [cleanupret_block] - // - // The IR after the cleanup will look like this: - // - // [cleanuppad_block] - // | - // [regular_block] - // | - // [unreachable] - // - // So regular_block will lead to an unreachable block, which is also - // valid. There is no need to replace regular_block with unreachable - // in this context now. - // On the other hand, the cleanupret/catchret block's successors - // need to know about the deletion of their predecessors. - emptyAndDetachBlock(ReturnInstrBB, Updates, KeepOneInputPHIs); + UniqueEHRetBlocksToDelete.insert(ReturnInstrBB); } } + for (BasicBlock *EHRetBB : + make_early_inc_range(UniqueEHRetBlocksToDelete)) + emptyAndDetachBlock(EHRetBB, Updates, KeepOneInputPHIs); } } ```
2025-09-19[MCA] Enable customization of individual instructions (#155420)Roman Belenov2-3/+48
Currently MCA takes instruction properties from scheduling model. However, some instructions may execute differently depending on external factors - for example, latency of memory instructions may vary differently depending on whether the load comes from L1 cache, L2 or DRAM. While MCA as a static analysis tool cannot model such differences (and currently takes some static decision, e.g. all memory ops are treated as L1 accesses), it makes sense to allow manual modification of instruction properties to model different behavior (e.g. sensitivity of code performance to cache misses in particular load instruction). This patch addresses this need. The library modification is intentionally generic - arbitrary modifications to InstrDesc are allowed. The tool support is currently limited to changing instruction latencies (single number applies to all output arguments and MaxLatency) via coments in the input assembler code; the format is the like this: add (%eax), eax // LLVM-MCA-LATENCY:100 Users of MCA library can already make additional customizations; command line tool can be extended in the future. Note that InstructionView currently shows per-instruction information according to scheduling model and is not affected by this change. See https://github.com/llvm/llvm-project/issues/133429 for additional clarifications (including explanation why existing customization mechanisms do not provide required functionality) --------- Co-authored-by: Min-Yih Hsu <min@myhsu.dev>
2025-09-19[SampleProfile] Always use FAM to get OREAiden Grossman1-14/+9
The split in this code path was left over from when we had to support the old PM and the new PM at the same time. Now that the legacy pass has been dropped, this simplifies the code a little bit and swaps pointers for references in a couple places. Reviewers: aeubanks, efriedma-quic, wlei-llvm Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/159858
2025-09-19[RISCV] Update comments in RISCVMatInt to reflect we don't always use ADDIW ↵Craig Topper1-14/+15
after LUI now. NFC (#159829) The simm32 base case only uses lui+addiw when necessary after 3d2650bdeb8409563d917d8eef70b906323524ef The worst case 8 instruction sequence doesn't leave a full 32 bits for the LUI+ADDI(W) after the 3 12-bit ADDI and SLLI pairs are created. So we will never generate LUI+ADDIW in the worst case sequence.
2025-09-19[SROA] Use tree-structure merge to remove alloca (#152793)Chengjun1-7/+306
This patch introduces a new optimization in SROA that handles the pattern where multiple non-overlapping vector `store`s completely fill an `alloca`. The current approach to handle this pattern introduces many `.vecexpand` and `.vecblend` instructions, which can dramatically slow down compilation when dealing with large `alloca`s built from many small vector `store`s. For example, consider an `alloca` of type `<128 x float>` filled by 64 `store`s of `<2 x float>` each. The current implementation requires: - 64 `shufflevector`s( `.vecexpand`) - 64 `select`s ( `.vecblend` ) - All operations use masks of size 128 - These operations form a long dependency chain This kind of IR is both difficult to optimize and slow to compile, particularly impacting the `InstCombine` pass. This patch introduces a tree-structured merge approach that significantly reduces the number of operations and improves compilation performance. Key features: - Detects when vector `store`s completely fill an `alloca` without gaps - Ensures no loads occur in the middle of the store sequence - Uses a tree-based approach with `shufflevector`s to merge stored values - Reduces the number of intermediate operations compared to linear merging - Eliminates the long dependency chains that hurt optimization Example transformation: ``` // Before: (stores do not have to be in order) %alloca = alloca <8 x float> store <2 x float> %val0, ptr %alloca ; offset 0-1 store <2 x float> %val2, ptr %alloca+16 ; offset 4-5 store <2 x float> %val1, ptr %alloca+8 ; offset 2-3 store <2 x float> %val3, ptr %alloca+24 ; offset 6-7 %result = load <8 x float>, ptr %alloca // After (tree-structured merge): %shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> ``` Benefits: - Logarithmic depth (O(log n)) instead of linear dependency chains - Fewer total operations for large vectors - Better optimization opportunities for subsequent passes - Significant compilation time improvements for large vector patterns For some large cases, the compile time can be reduced from about 60s to less than 3s. --------- Co-authored-by: chengjunp <chengjunp@nividia.com>
2025-09-19[DependenceAnalysis] Extending SIV to handle fusable loops (#128782)Alireza Torabian1-158/+303
When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.
2025-09-19[CodeGenPrepare] Consider target memory intrinics as memory use (#159638)Jeffrey Byrnes1-0/+13
When deciding to sink address instructions into their uses, we check if it is profitable to do so. The profitability check is based on the types of uses of this address instruction -- if there are users which are not memory instructions, then do not fold. However, this profitability check wasn't considering target intrinsics, which may be loads / stores. This adds some logic to handle target memory intrinsics.
2025-09-19Revert "[PowerPC] clean unused PPC target feature FeatureBPERMD" (#159837)Sergei Barannikov1-1/+4
Reverts llvm/llvm-project#159782 The PR breaks multiple build bots and CI as well.
2025-09-19[KnownBits] Add setAllConflict to set all bits in Zero and One. NFC (#159815)Craig Topper4-36/+21
This is a common pattern to initialize Knownbits that occurs before loops that call intersectWith.
2025-09-19[LV] Pass operand info to getMemoryOpCost in getMemInstScalarizationCost.Florian Hahn1-3/+4
Pass operand info to getMemoryOpCost in getMemInstScalarizationCost. This matches the behavior in VPReplicateRecipe::computeCost.
2025-09-19[AArch64] Clean up the formatting of some bitconvert patterns. NFCDavid Green1-145/+144
2025-09-19[ARM] Replace ABS and tABS machine nodes with custom lowering (#156717)AZero138-151/+60
Just do a custom lowering instead. Also copy paste the cmov-neg fold to prevent regressions in nabs.
2025-09-19[ELF][LLDB] Add an nvsass triple (#159459)Walter Erquinigo4-3/+19
When handling CUDA ELF files via objdump or LLDB, the ELF parser in LLVM needs to distinguish if an ELF file is sass or not, which requires a triple for sass to exist in llvm. This patch includes all the necessary changes for LLDB and objdump to correctly identify these files with the correct triple.
2025-09-19[PowerPC] clean unused PPC target feature FeatureBPERMD (#159782)zhijian lin1-4/+1
clean unused PPC target feature FeatureBPERMD.
2025-09-19[ARM] Verify that disassembled instruction is correct (#157360)Sergei Barannikov1-41/+27
This change adds basic `MCInst` verification (checks the number of operands) and fixes detected bugs. * `RFE*` instructions have only one operand, but `DecodeRFEInstruction` added two. * `DecodeMVEModImmInstruction` and `DecodeMVEVCMP` added a `vpred` operand, but this is what `AddThumbPredicate` normally does. This resulted in an extra `vpred` operand. * `DecodeMVEVADCInstruction` added an extra immediate operand. * `getARMInstruction` added a `pred` operand to instructions that don't have one (via `DecodePredicateOperand`). * `AddThumb1SBit` appended an extra register operand to instructions that don't modify CPSR (such as `tBL`). * Instructions in `NEONDup` namespace have `pred` operand that the generated code successfully decodes. The operand was added once again by `getARMInstruction`/`getThumbInstruction` via `AddThumbPredicate`. Functional changes extracted from #156540.
2025-09-19[PassBuilder] Add callback invoking to PassBuilder string API (#157153)Gabriel Baraldi2-21/+144
This is a very rough state of what this can look like, but I didn't want to spend too much time on what could be a dead end. Currently the only way to invoke callbacks is by using the default pipelines, this is an issue if you want to define your own pipeline using the C string API (we do that in LLVM.jl in julia) so I extended the api to allow for invoking those callbacks just like one would call a pass of that kind. There are some questions about the params that these callbacks take and also I'm missing some of them (some of them are also invoked by the backend so we may not want to expose them) Code written with AI help, bugs are mine. (Not sure what policy for this is on LLVM)
2025-09-19[RISCV] Use MutableArrayRef instead of SmallVectorImpl&. NFC (#159651)Craig Topper1-2/+2
We're only going to modify existing items, not add or remove any elements to the vector.
2025-09-19[AArch64] Remove post-decoding instruction mutations (#156364)Sergei Barannikov6-78/+153
Add `bits<0>` fields to instructions using the ZTR/MPR/MPR8 register classes. These register classes contain only one register, and it is not encoded in the instruction. This way, the generated decoder can completely decode instructions without having to perform a post-decoding pass to insert missing operands. Some immediate operands are also not encoded and have only one possible value "zero". Use this trick for them, too. Finally, remove `-ignore-non-decodable-operands` option from `llvm-tblgen` invocation to ensure that non-decodable operands do not appear in the future.
2025-09-19[WebAssembly] Require tags for Wasm EH and Wasm SJLJ to be defined ↵Sam Clegg3-31/+1
externally (#159143) Rather then defining these tags in each object file that requires them we can can declare them as undefined and require that they defined externally in, for example, compiler-rt or libcxxabi.
2025-09-19[AMDGPU]: Unpack packed instructions overlapped by MFMAs post-RA scheduling ↵Akash Dutta3-5/+398
(#157968) This is a cleaned up version of PR #151704. These optimizations are now performed post-RA scheduling.
2025-09-19[RISCV] Re-work how VWADD_W_VL and similar _W_VL nodes are handled in ↵Craig Topper1-37/+49
combineOp_VLToVWOp_VL. (#159205) These instructions have one already narrow operand. Previously, we pretended like this operand was a supported extension. This could cause problems when we called getOrCreateExtendedOp on this narrow operand when creating the the VWADD_VL. If the narrow operand happened to be an extend of the opposite type, we would peek through it and then rebuild it with the wrong extension type. So (vwadd_w_vl (i32 (sext X)), (i16 (zext Y))) would become (vwadd_vl (i16 (sext X)), (i16 (sext Y))). To prevent this, we ignore the operand instead and pass std::nullopt for SupportsExt to getOrCreateExtendedOp so it won't peek through any extends on the narrow source. Fixes #159152.
2025-09-19[RISCV] Fix build after e747223c03e16d02cd0dc6f8eedb5c825a7366c1Michael Liao1-2/+2
2025-09-19[NFC][RISCV] Move Zvfbf* relative stuffs to RISCVInstrInfoZvfbf.td (#159619)Brandon Wu5-60/+66
2025-09-19[IR] enable attaching metadata on ifuncs (#158732)Wael Yehia5-0/+30
Teach the IR parser and writer to support metadata on ifuncs, and update documentation. In PR #153049, we have a use case of attaching the `!associated` metadata to an ifunc. Since an ifunc is similar to a function declaration, it seems natural to allow metadata on ifuncs. Currently, the metadata API allows adding Metadata to llvm::GlobalObject, so the in-memory IR allows for metadata on ifuncs, but the IR reader/writer is not aware of that. --------- Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
2025-09-19PPC: Replace PointerLikeRegClass with RegClassByHwMode (#158777)Matt Arsenault4-24/+26
2025-09-19[PowerPC] Fix vector extend result types in BUILD_VECTOR lowering (#159398)RolandF771-1/+5
The result type of the vector extend intrinsics generated by the BUILD_VECTOR lowering code should match how they are actually defined. Currently the result type is defaulting to the operand type there. This can conflict with calls to the same intrinsic from other paths.
2025-09-19[PowerPC] using milicode call for strlen instead of lib call (#153600)zhijian lin6-6/+41
AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strlen routine is a millicode implementation; we use millicode for the strlen function instead of a library call to improve performance.
2025-09-19Mips: Switch to RegClassByHwMode (#158273)Matt Arsenault8-48/+80
2025-09-19X86: Switch to RegClassByHwMode (#158274)Matt Arsenault7-41/+55
Replace the target uses of PointerLikeRegClass with RegClassByHwMode
2025-09-19[CodeGen][NewPM] Port `ReachingDefAnalysis` to new pass manager. (#159572)Mikhail Gudim9-163/+207
In this commit: (1) Added new pass manager support for `ReachingDefAnalysis`. (2) Added printer pass. (3) Make old pass manager use `ReachingDefInfoWrapperPass`
2025-09-19X86: Avoid using isArch64Bit for 64-bit checks (#157412)Matt Arsenault9-31/+30
Just directly check x86_64. isArch64Bit just adds extra steps around this.
2025-09-19SPARC: Use RegClassByHwMode instead of PointerLikeRegClass (#158271)Matt Arsenault2-10/+19
2025-09-19[LLVM][CodeGen] Update PPCFastISel::SelectRet for ConstantInt based vectors. ↵Paul Walker1-1/+2
(#159331) The current implementation assumes ConstantInt return values are scalar, which is not true when use-constant-int-for-fixed-length-splat is enabled.
2025-09-19[LLVM][SCEV] Look through common vscale multiplicand when simplifying ↵Paul Walker1-1/+20
compares. (#141798) My usecase is simplifying the control flow generated by LoopVectorize when vectorising loops whose tripcount is a function of the runtime vector length. This can be problematic because: * CSE is a pre-LoopVectorize transform and so it's common for an IR function to include several calls to llvm.vscale(). (NOTE: Code generation will typically remove the duplicates) * Pre-LoopVectorize instcombines will rewrite some multiplies as shifts. This leads to a mismatch between VL based maths of the scalar loop and that created for the vector loop, which prevents some obvious simplifications. SCEV does not suffer these issues because it effectively does CSE during construction and shifts are represented as multiplies.
2025-09-19[X86] Fold X * 1 + Z --> X + Z for VPMADD52L (#158516)Hongyu Chen1-1/+23
This patch implements the fold `lo(X * 1) + Z --> lo(X) + Z --> X iff X == lo(X)`.
2025-09-19CodeGen: Add RegisterClass by HwMode (#158269)Matt Arsenault6-9/+17
This is a generalization of the LookupPtrRegClass mechanism. AMDGPU has several use cases for swapping the register class of instruction operands based on the subtarget, but none of them really fit into the box of being pointer-like. The current system requires manual management of an arbitrary integer ID. For the AMDGPU use case, this would end up being around 40 new entries to manage. This just introduces the base infrastructure. I have ports of all the target specific usage of PointerLikeRegClass ready.
2025-09-19[DA] Add overflow check in ExactSIV (#157086)Ryotaro Kasuga1-1/+13
This patch adds an overflow check to the `exactSIVtest` function to fix the issue demonstrated in the test case added in #157085. This patch only fixes one of the routines. To fully resolve the test case, the other functions need to be addressed as well.
2025-09-19[X86] Allow all legal integers to optimize smin with 0 (#151893)AZero131-1/+1
It makes no sense why smin has to be limited to 32 and 64 bits. hasAndNot only exists for 32 and 64 bits, so this does not affect smax.
2025-09-19[llvm-debuginfo-analyzer] Add `--output-sort=(none|id)` option (#145761)Javier Lopez-Gomez1-3/+9
- The output for `--output-sort=id` matches `--output-sort=offset` for the available readers. Tests were updated accordingly. - For `--output-sort=none`, and per `LVReader::sortScopes()`, `LVScope::sort()` is called on the root scope. `LVScope::sort()` has no effect if `getSortFunction() == nullptr`, and thus the elements are currently traversed in the order in which they were initially added. This should change, however, after `LVScope::Children` is removed.
2025-09-19[AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (#146075)Fabian Ritter1-0/+13
If we can't fold a PTRADD's offset into its users, lowering them to disjoint ORs is preferable: Often, a 32-bit OR instruction suffices where we'd otherwise use a pair of 32-bit additions with carry. This needs to be a DAGCombine (and not a selection rule) because its main purpose is to enable subsequent DAGCombines for bitwise operations. We don't want to just turn PTRADDs into disjoint ORs whenever that's sound because this transform loses the information that the operation implements pointer arithmetic, which AMDGPU for instance needs when folding constant offsets. For SWDEV-516125.
2025-09-19Fix NDEBUG Wundef warning; NFC (#159539)Sven van Haastregt1-1/+1
The `NDEBUG` macro is tested for defined-ness everywhere else. The instance here triggers a warning when compiling with `-Wundef`.
2025-09-19RISC-V: builtins support for MIPS RV64 P8700 execution control .UmeshKalappa2-1/+8
the following changes are made a)Typo Fix (with previous PRhttps://github.com/llvm/llvm-project/pull/155747) b)builtins support for MIPS P8700 execution control instructions . c)Testcase
2025-09-19[SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (#146074)Fabian Ritter3-100/+88
This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds, that targets can use to allow transformations to introduce out-of-bounds pointer arithmetic. It also moves two such transformations from the AMDGPU-specific DAG combines to the generic DAGCombiner. This is motivated by target features like AArch64's checked pointer arithmetic, CPA, which does not tolerate the introduction of out-of-bounds pointer arithmetic.
2025-09-19[AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (#145330)Fabian Ritter4-11/+23
There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125.
2025-09-19[RISCV] Implement MC support for Zvfofp8min extension (#157014)Jim Lin5-5/+46
This patch adds MC support for Zvfofp8min https://github.com/aswaterman/riscv-misc/blob/main/isa/zvfofp8min.adoc.
2025-09-19[SeparateConstOffsetFromGEP] Check if non-extracted indices may be negative ↵Fabian Ritter1-6/+7
when preserving inbounds (#159515) If we know that the initial GEP was inbounds, and we change it to a sequence of GEPs from the same base pointer where every offset is non-negative, then the new GEPs are inbounds. So far, the implementation only checked if the extracted offsets are non-negative. In cases where non-extracted offsets can be negative, this would cause the inbounds flag to be wrongly preserved. Fixes an issue in #130617 found by nikic.
2025-09-19[RISCV][GISel] Support select vx, vf form rvv intrinsics (#157398)Jianjian Guan4-2/+51
For vx form, we legalize it with widen scalar. And for vf form, we select the right register bank.