aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib
AgeCommit message (Collapse)AuthorFilesLines
2025-10-23[WebAssembly] [Codegen] Add pattern for relaxed min max from pmin/pmax-based ↵Jasmine Tang1-12/+24
patterns over v4f32 and v2f64 (#164486) Related to https://github.com/llvm/llvm-project/issues/55932
2025-10-23[InstCombine] Allow folding cross-lane operations into PHIs/selects (#164388)Benjamin Maxwell3-8/+24
Previously, cross-lane operations were disallowed here, but they are only problematic if the `select` condition is a vector, as the input of the operation is not simply one of the arms of the phi/select.
2025-10-23[Analysis] Use the addCost() helper across InlineCost.cpp (#141901)Gleb Popov1-3/+3
For the sake of consistency.
2025-10-23[RISCV][GISel] Fold G_FCONSTANT 0.0 store into G_CONSTANT x0 (#163008)Shaoce SUN2-1/+61
2025-10-23[SDAG] Introduce inbounds flag for ISD::PTRADD (#162477)Fabian Ritter3-1/+10
This patch introduces SDNodeFlags::InBounds, to show that an ISD::PTRADD SDNode implements an inbounds getelementptr operation (i.e., the pointer operand is in bounds wrt. an allocated object it is based on, and the arithmetic does not change that). The flag is set in the DAG construction when lowering inbounds GEPs. Inbounds information is useful in the ISel when selecting memory instructions that perform address computations whose intermediate steps must be in the same memory region as the final result. Follow-up patches to propagate the flag in DAGCombines and to use it when lowering AMDGPU's flat memory instructions, where the immediate offset must not affect the memory aperture of the address (similar to this GISel patch: #153001), are planned. This mirrors #150900, which has introduced a similar flag in GlobalISel. This patch supersedes #131862, which previously attempted to introduce an SDNodeFlags::InBounds flag. The difference between this PR and #131862 is that there is now an ISD::PTRADD opcode (PR #140017) and the InBounds flag is only defined to apply to ISD::PTRADD DAG nodes. It is therefore unambiguous that in-bounds-ness refers to a memory object into which the left operand of the PTRADD node points (in contrast to #131862, where InBounds would have applied to commutative ISD::ADD nodes, so that the semantics would be more difficult to reason about). For SWDEV-516125.
2025-10-23[SelectionDAG] Legalize <1 x T> vector types for atomic load (#148894)jofrn2-0/+16
`load atomic <1 x T>` is not valid. This change legalizes vector types of atomic load via scalarization in SelectionDAG so that it can, for example, translate from `v1i32` to `i32`.
2025-10-23IR/Verifier: Allow vector type in atomic load and store (#148893)jofrn1-6/+9
Vector types on atomics are assumed to be invalid by the verifier. However, this type can be valid if it is lowered by codegen.
2025-10-23[Passes] Report error when pass requires target machine (#142550)paperchalice5-23/+98
Fixes #142146 Do nullptr check when pass accept `const TargetMachine &` in constructor, but it is still not exhaustive.
2025-10-23[NFC][GlobPattern] Add GlobPattern::longest_substr() (#164512)Vitaly Buka1-0/+48
Finds longest (almost) plain substring in the pattern. Implementation is conservative to avoid false positives. The result is not used to optimize `GlobPattern::match()` so it's calculated on request. For * https://github.com/llvm/llvm-project/pull/164545 --------- Co-authored-by: Luke Lau <luke@igalia.com>
2025-10-23[RegAlloc] Constrain rematted regclass to use (#164386)Luke Lau1-0/+3
When rematting we create a new virtual register with the original def's register class. However the use may have a different register class if the interval is split, which means we end up with an invalid register class. This fixes #164181 by constraining the newly created register to the use's register class. The test case is reduced as far as it goes. Because this test requires us to reach a certain amount of register pressure in certain conditions I'm not sure if there's an easy way to handwrite this scenario.
2025-10-23[llvm][RISCV] Handle fpround and fpextend for zvfbfa without zvfbfmin (#164366)Brandon Wu1-0/+34
Add codegen support for fpround and fpextend for zvfbfa. resolve https://github.com/llvm/llvm-project/issues/164324
2025-10-23CodeGen: Fix crash when no libcall is available for stackguard (#164211)Matt Arsenault2-10/+25
Not all the paths appear to be implemented for GlobalISel
2025-10-22[AMDGPU] Add intrinsics for v_[pk]_add_{min|max}_* instructions (#164731)Stanislav Mekhanoshin6-14/+41
2025-10-23[LoongArch] Move widenShuffleMask before lowerVECTOR_SHUFFLE_XVPERMI to ↵Zhaoxin Yang1-0/+3
improve code quality (#164219)
2025-10-22[NFC][GlobPattern] Change internal structure of GlobPattern (#164513)Vitaly Buka1-8/+11
Replace two StringRefs with One StringRef + 2 x size_t. Prepare for: * https://github.com/llvm/llvm-project/pull/164512
2025-10-22[HLSL] Allow completely unused cbuffers (#164557)Justin Bogner1-2/+7
We were checking for cbuffers where the global was removed, but if the buffer is completely unused the whole thing can be null. --------- Co-authored-by: Helena Kotas <hekotas@microsoft.com>
2025-10-22[MIR2Vec][llvm-ir2vec] Add MIR2Vec support to llvm-ir2vec tool (#164025)S. VenkataKeerthy1-55/+36
Add MIR2Vec support to the llvm-ir2vec tool, enabling embedding generation for Machine IR alongside the existing LLVM IR functionality. (This is an initial integration; Other entity/triplet gen for vocab generation would follow as separate patches)
2025-10-22[llvm] Update call graph ELF section type. (#164461)Prabhu Rajasekaran5-2/+8
Make call graph section to have a dedicated type instead of the generic progbits type.
2025-10-22Revert "[VPlan] Run narrowInterleaveGroups during general VPlan ↵Florian Hahn5-106/+52
optimizations. (#149706)" This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c. There have been reports of mis-compiles in https://github.com/llvm/llvm-project/pull/149706. Revert while I investigate.
2025-10-22[MemoryLocation] Support strided matrix loads / stores (#163368)Nathan Corbyn1-0/+28
This patch provides an approximation of the memory locations touched by `llvm.matrix.column.major.load` and `llvm.matrix.column.major.store`, enabling dead store elimination and GVN to remove redundant loads and dead stores. PR: https://github.com/llvm/llvm-project/pull/163368
2025-10-22[ThinLTO] Add HasLocal flag to GlobalValueSummaryInfo (#164647)Teresa Johnson1-0/+4
Add a flag to the GlobalValueSummaryInfo indicating whether the associated SummaryList (all summaries with the same GUID) contains any summaries with local linkage. This flag is set when building the index, so it is associated with the original linkage type before internalization and promotion. Consumers should check the withInternalizeAndPromote() flag on the index before using it. In most cases we expect a 1-1 mapping between a GUID and a summary with local linkage, because for locals the GUID is computed from the hash of "modulepath;name". However, there can be multiple locals with the same GUID if translation units are not compiled with enough path. And in rare but theoretically possible cases, there can be hash collisions on the underlying MD5 computation. So to be safe when looking for local summaries, analyses currently look through all summaries in the list. These lists can be extremely long in the case of large binaries with template function defs in widely used headers (i.e. linkonce_odr). A follow on change will use this flag to reduce ThinLTO analysis time in WPD by 5-6% for a large target (details in PR164046 which will be reworked to use this flag). Note that in the past we have tried to keep bits related to the GUID in the ValueInfo (which has a pointer to the associated GlobalValueSummaryInfo), via its PointerIntPair. However, we are out of bits there. This change does add a byte to every GlobalValueSummaryInfo instance, which I measured as a little under 0.90% overhead in a large target. However, it enables adding 7 bits of other per-GUID flags in the future without adding more overhead. Note that it was lower overhead to add this to the GlobalValueSummaryInfo than the ValueInfo, which tends to be copied into other maps.
2025-10-22[MIR2Vec] Handle Operands (#163281)S. VenkataKeerthy2-78/+327
Handling opcodes in embedding computation. - Revamped MIR Vocabulary with four sections - `Opcodes`, `Common Operands`, `Physical Registers`, and `Virtual Registers` - Operands broadly fall into 3 categories -- the generic MO types that are common across architectures, physical and virtual register classes. We handle these categories separately in MIR2Vec. (Though we have same classes for both physical and virtual registers, their embeddings vary).
2025-10-22[DirectX] Fix crash when naming buffers of arrays (#164553)Justin Bogner1-0/+8
DXILResource was falling over trying to name a resource type that contained an array, such as `StructuredBuffer<float[3][2]>`. Handle this by walking through array types to gather the dimensions.
2025-10-22[SPIRV][HLSL] Fix assert with cbuffers through constexpr (#164555)Justin Bogner1-6/+8
The comment here pointed out that RAUW would fall over given a constantexpr, but then proceeded to just do what RAUW does by hand, which falls over in the same way. Instead, convert constantexprs involving cbuffer globals to instructions before processing them. The test update just modifies the existing cbuffer test, since it implied it was trying to test this exact case anyways.
2025-10-22[BasicBlockUtils] Add BasicBlock printer (#163066)Robert Imschweiler1-0/+10
2025-10-22[SCEV] Expose getGEPExpr without needing to pass GEPOperator* (NFC) (#164487)Florian Hahn1-6/+11
Add a new getGEPExpr variant which is independent of GEPOperator*. To be used to construct SCEVs for VPlan recipes in https://github.com/llvm/llvm-project/pull/161276. PR: https://github.com/llvm/llvm-project/pull/164487
2025-10-22[LLVM][IR] Add location tracking to LLVM IR parser (#155797)Bertik236-19/+141
This PR is part of the LLVM IR LSP server project ([RFC](https://discourse.llvm.org/t/rfc-ir-visualization-with-vs-code-extension-using-an-lsp-server/87773)) To be able to make a LSP server, it's crucial to have location information about the LLVM objects (Functions, BasicBlocks and Instructions). This PR adds: * Position tracking to the Lexer * A new AsmParserContext class, to hold the new position info * Tests to check if the location is correct The AsmParserContext can be passed as an optional parameter into the parser. Which populates it and it can be then used by other tools, such as the LSP server. The AsmParserContext idea was borrowed from MLIR. As we didn't want to store data no one else uses inside the objects themselves. But the implementation is different, this class holds several maps of Functions, BasicBlocks and Instructions, to map them to their location. And some utility methods were added to get the positions of the processed tokens.
2025-10-22[ThinLTO] Add index flag for internalization/promotion status (#164530)Teresa Johnson3-2/+10
Add an index-wide flag indicating whether index-based internalization and promotion have completed. This will be used in a follow on change.
2025-10-22[BPF] Do not emit names for PTR, CONST, VOLATILE and RESTRICT BTF types ↵Michal R1-1/+18
(#163174) We currently raise a warning in `print_btf.py` when any of these types have a name. Linux kernel doesn't allow names in these types either.[0] However, there is nothing stopping frontends from giving names to these types. To make sure that they are always anonymous, explicitly skip the name emission. [0] https://elixir.bootlin.com/linux/v6.17.1/source/kernel/bpf/btf.c#L2586
2025-10-22[LV] Ignore user-specified interleave count when unsafe. (#153009)Kerry McLaughlin1-1/+11
When an VF is specified via a loop hint, it will be clamped to a safe VF or ignored if it is found to be unsafe. This is not the case for user-specified interleave counts, which can lead to loops such as the following with a memory dependence being vectorised with interleaving: ``` #pragma clang loop interleave_count(4) for (int i = 4; i < LEN; i++) b[i] = b[i - 4] + a[i]; ``` According to [1], loop hints are ignored if they are not safe to apply. This patch adds a check to prevent vectorisation with interleaving if isSafeForAnyVectorWidth() returns false. This is already checked in selectInterleaveCount(). [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
2025-10-22[PredicateInfo] Reserve adjacent LN_Last defs for the same phi use (#164577)Kunqiu Chen1-3/+10
This patch fixes a missed optimization issue: predicate infos might be lost in phi-use scenarios. Due to the existence of and-chains, a phi-use might be associated with multiple LN_Last predicate infos. E.g., ```cpp // TWO LN_Last Predicate Info defs: // 1. a >= 1 // 2. a < 2 if ( a < 1 || a >= 2) { a = 1; } // PHI use of `a` use(a) ``` However, previously, `popStackUntilDFSScope` reserved only ONE LN_Last def for a phi use (i.e., reserve only one of `a >= 1` / `a < 2`), although there might be multiple LN_Last defs for the same phi use. This patch reserves the adjacent LN_Last defs if they are designated for the same phi use.
2025-10-22[CodeGen] Add "override" where appropriate (NFC) (#164571)Kazu Hirata7-8/+8
Note that "override" makes "virtual" redundant. Identified with modernize-use-override.
2025-10-22[AMDGPU] Reland "Remove redundant s_cmp_lg_* sX, 0" (#164201)LU-JOHN2-6/+109
Reland PR https://github.com/llvm/llvm-project/pull/162352. Fix by excluding SI_PC_ADD_REL_OFFSET from instructions that set SCC = DST!=0. Passes check-libc-amdgcn-amd-amdhsa now. Distribution of instructions that allowed a redundant S_CMP to be deleted in check-libc-amdgcn-amd-amdhsa test: ``` S_AND_B32 485 S_AND_B64 47 S_ANDN2_B32 42 S_ANDN2_B64 277492 S_CSELECT_B64 17631 S_LSHL_B32 6 S_OR_B64 11 ``` --------- Signed-off-by: John Lu <John.Lu@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-10-22[AgressiveInstCombine] Merge debug info on merged stores (#164449)Orlando Cazalet-Hyams1-1/+11
A bit of debug info maintenaince for #147540.
2025-10-22[VPlan] Skip masked interleave groups in narrowInterleaveGroups.Florian Hahn1-1/+1
8d29d09309 exposed a crash due to incorrectly trying to handle masked interleave recipes. For now, the current code does not support masked interleave recipes. Bail out for them.
2025-10-22Reapply "[Polly] Update ScopInliner for NPM (#125427)" (#164601)Michael Kruse1-5/+5
An assertion failed when Polly was registering for the pass manager which assumed that there would be only Polly passes. Since this does not need to be the case, re-apply with the assert removed. Includes a non-Polly change to trigger the premerge CI to trigger check-llvm which failed for 0b9a7b80c0674c5c6f746139912111bea7eae63b, but pre-merge did not catch.
2025-10-22[X86] Fix some values for Znver4 model (#161405)NexusXe1-47/+63
This PR fixes a handful of latency and uop changes between Znver3 and Znver4 that were otherwise copied from Znver3. Latency and uop values listed that matched Zen3 on uops.info were updated to those for Zen4. Includes: BSF/BSR, DIV, TZCNT, CLMUL, PCMPISTRM, VALIGN, VPERM
2025-10-22[LLVM][CodeGen][SVE] Fix typo in PPR_p8to15's DecoderMethod. (#164429)Paul Walker1-1/+1
2025-10-22[LLVM][CodeGen][AArch64] Fix global-isel for LD1R. (#164418)Paul Walker2-7/+7
LD1Rv8b only supports a base register but the DAG is matched using am_indexed8 with the offset it finds silently dropped. I've also fixed a couple of immediate operands types inconsistencies that don't manifest as bugs because their incorrect scaling is overriden by the complex pattern and MachineInstr that are correct and thus there's nothing to test.
2025-10-22[WPD]: Enable speculative devirtualizatoin. (#159048)Hassnaa Hamdi1-18/+60
This patch implements the speculative devirtualization feature in the LLVM backend. It handles the case of single implementation devirtualization where there is a single possible callee of a virtual function. - Add cl::opt 'devirtualize-speculatively' to enable it. - Flag is disabled by default. - It works regardless of the visibility of the object. - Not enabled for LTO for now.
2025-10-22[NFC] "unsafe-fp-math" post cleanup (code comments part) (#164582)paperchalice4-7/+7
2025-10-22[ShrinkWrap] Consider constant pool access as non-stack access (#164393)Sushant Gokhale1-1/+1
As far as I understand, constant pool access does not access stack and accesses read-only memory. This patch considers constant pool access as non-stack access allowing shrink wrapping to happen in the concerned test. We should be seeing perf improvement with povray benchmark from SPEC17(around 12% with -flto -Ofast) after this patch. An NFC PR #162476 already exists to upload the test before the patch but approval has got delayed. So, as @davemgreen suggested in that PR, I have uploaded the test and patch in this single PR to show how test looks like.
2025-10-22[AutoUpgrade] Gracefully handle invalid alignment on masked intrinsicsNikita Popov1-8/+22
Generate a usage error instead of asserting.
2025-10-22[WebAssembly] [Codegen] Add pattern for relaxed min max from ↵Jasmine Tang2-0/+22
fminimum/fmaximum over v4f32 and v2f64 (#162948) Related to #55932
2025-10-22[DAG] Create SDPatternMatch method `m_SelectLike` to match `ISD::Select` and ↵kper1-20/+19
`ISD::VSelect` (#164069) Fixes #150019
2025-10-22[AllocToken] Make token mode a pass parameter (#163634)Marco Elver4-18/+41
Refactor the AllocToken pass to accept the mode via pass options rather than LLVM cl::opt. This is both cleaner, but also required to make the mode frontend-driven and avoid potential inconsistencies.
2025-10-22[FastIsel] Get the right register type for call instruction (#164565)Luo Yuanke1-1/+6
When switch from fast isel to dag isel the input value is from llvm IR instruction. If the instruction is call we should get the calling convention of the callee and pass it to RegsForValue::getCopyFromRegs, so that it can deduce the right RegisterVT of the returned value of the callee. --------- Co-authored-by: Yuanke Luo <ykluo@birentech.com>
2025-10-22Revert "[InstCombinePHI] Enhance PHI CSE to remove redundant phis" (#164520)Arthur Eubanks1-91/+1
Reverts llvm/llvm-project#163453 Causes crashes, see https://github.com/llvm/llvm-project/pull/163453#issuecomment-3429922732
2025-10-22[AllocToken] Refactor stateless token calculation into Support (#163633)Marco Elver3-39/+69
Refactor the stateless (hash-based) token calculation logic out of the `AllocToken` pass and into `llvm/Support/AllocToken.h`. This helps with making the token calculation logic available to other parts of the codebase, which will be necessary for frontend implementation of `__builtin_infer_alloc_token` to perform constexpr evaluation. The `AllocTokenMode` enum and a new `AllocTokenMetadata` struct are moved into a shared header. The `getAllocTokenHash()` function now provides the source of truth for calculating token IDs for `TypeHash` and `TypeHashPointerSplit` modes.
2025-10-22[ARM][MVE] Invalid tail predication in LowOverheadLoop pass (#163941)Simon Tatham3-0/+28
When a loop is converted into a low-overhead loop using tail predication via FPSCR.LTPSIZE, the MQPRCopy pseudo-instruction is expanded into either two VMOVD or a single MVE_VORR, depending on whether the values written to the lanes with a 'false' predicate matter. (MVE_VORR uses the ambient LTPSIZE predicate, so it won't write those lanes at all; the double VMOVD is slower but gets them right.) This check was done based on whether the output of the MQPRCopy is live coming out of the loop. But it missed a case where the live-out value is not _itself_ an MQPRCopy, but is a predicated operation taking its false lanes from an MQPRCopy. Fixes #162644, and adds a new MIR test case derived from the reproducer in that bug.