riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-09-22	[Support] Fix some warnings in LSP Transport (#160010)	Alexandre Ganea	1	-1/+1
	When building with latest MSVC on Windows, this fixes some compile-time warnings from last week's integration in https://github.com/llvm/llvm-project/pull/157885: ``` [321/5941] Building CXX object lib\Support\LSP\CMakeFiles\LLVMSupportLSP.dir\Transport.cpp.obj C:\git\llvm-project\llvm\lib\Support\LSP\Transport.cpp(123): warning C4930: 'std::lock_guard<std::mutex> responseHandlersLock(llvm::lsp::MessageHandler::ResponseHandlerTy)': prototyped function not called (was a variable definition intended?) [384/5941] Building CXX object unittests\Support\LSP\CMakeFiles\LLVMSupportLSPTests.dir\Transport.cpp.obj C:\git\llvm-project\llvm\unittests\Support\LSP\Transport.cpp(190): warning C4804: '+=': unsafe use of type 'bool' in operation ```
2025-09-22	Regalloc: Add operator >= to EvictionCost (#160070)	Matt Arsenault	1	-1/+1
	Make the actual use context less ugly.
2025-09-22	Greedy: Simplify collectHintInfo using MachineOperands. NFCI. (#159724)	Jay Foad	1	-13/+9
	If a COPY uses Reg but only in an implicit operand then the new implementation ignores it but the old implementation would have treated it as a copy of Reg. Probably this case never occurs in practice. Other than that, this patch is NFC. Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-09-22	[NVPTX] Disable relative lookup tables (#159748)	Nikita Popov	1	-0/+5
	Relative lookup tables result in "LLVM ERROR: Circular dependency found in global variable set", so disable them for this target.
2025-09-22	[LoongArch] Custom legalizing ConstantFP to avoid float loads (#158050)	ZhaoQi	4	-0/+144
	This commit custom legalize `ConstantFP` using code sequence rather than simpily loading the fp values from constant pool. A new option (`-loongarch-materialize-float-imm=<enum>`) is added to set the maximum number of instructions (including code sequence to generate the value and moving the value to FPR) alllowed to be used when materializing floating-point immediates. The default value of the option is set to `3` on both LA32 and LA64. Which means: - For `f32` on both LA32 and LA64: `2 insts + movgr2fr.w`; (will cover all `f32` values) - For `f64` on LA64: `2 insts + movgr2fr.d`; - For `f64` on LA32: `1 inst + movgr2fr.w + movgr2frh.w`. (same inst latency as using constant pool) The option can be set in range `0,2-6`. (6 behaves same as 5 on LA64.)
2025-09-22	[DAG] Add ISD::VECTOR_COMPRESS handling in ↵	Kavin Gnanapandithan	1	-0/+22
	computeKnownBits/ComputeNumSignBits (#159692) Resolves #158332
2025-09-22	[X86] X86TargetLowering::computeKnownBitsForTargetNode - add ↵	黃國庭	1	-0/+20
	X86ISD::VPMADD52L\H handling - again (#159230) FIX #155386
2025-09-22	[AArch64][GlobalISel] Prevented llvm.aarch64.sisd.fcvtxn fallback (#159811)	Joshua Rodriguez	1	-0/+1
	Added extra check in AArch64RegisterBankInfo.cpp to mark llvm.aarch64.sisd.fcvtxn as having floating point operands
2025-09-22	[VPlan] Avoid branching around State.get (NFC) (#159042)	Ramkumar Ramachandra	1	-9/+3

2025-09-22	[DAG] Fold rem(rem(A, BCst), Op1Cst) -> rem(A, Op1Cst) (#159517)	kper	1	-0/+18
	Fixes [157370](https://github.com/llvm/llvm-project/issues/157370) UREM General proof: https://alive2.llvm.org/ce/z/b_GQJX SREM General proof: https://alive2.llvm.org/ce/z/Whkaxh I have added it as rv32i and rv64i tests because they are the only architectures where I could verify that it works.
2025-09-22	[VPlan] Add WidenGEP::getSourceElementType (NFC) (#159029)	Ramkumar Ramachandra	3	-17/+21

2025-09-22	[LoongArch] Split 256-bit build_vector to avoid using LASX element insertion ↵	ZhaoQi	1	-37/+76
	(#154918) Note: Only worse for `v8i32/v8f32/v4i64/v4f64` types when the high part only has one non-undef element. Skip spliting to avoid this.
2025-09-22	[AArch64] Remove unnecessary extloadi32 -> i32 pattern. NFCI (#159527)	David Green	1	-3/+1
	As far as I can tell this load pattern will not perform anything as it could only trigger from a i32 MemVT extended to a i32.
2025-09-22	[LoongArch] Broadcast repeated subsequence in build_vector instead of ↵	ZhaoQi	4	-15/+99
	inserting per element (#154533)
2025-09-22	[LoongArch] Fix assertion failure when vec-args are not fully passed in ↵	hev	1	-5/+12
	vec-regs (#159568) Fixes #159529
2025-09-22	Default stack alignment of X86 Hurd to 16 bytes (#158454)	Brad Smith	2	-2/+4
	https://sourceware.org/bugzilla/show_bug.cgi?id=21120
2025-09-22	[Coroutines] Take byval param alignment into account when spilling to frame ↵	Hans Wennborg	1	-4/+8
	(#159765) Fixes #159571
2025-09-21	[RISCV] Use SignExtend64<32> instead of ORing in 32 1s into upper bits in ↵	Craig Topper	1	-4/+4
	RISCVMatInt. NFC (#159864) I think this better reflects the intent of modification. In all these places we know bit 31 is 1 so we are sign extending.
2025-09-21	[CodeGen] Use MCRegister::id() to avoid implicit conversions to unsigned. ↵	Craig Topper	3	-28/+30
	NFC (#159965)
2025-09-21	[ExecutionEngine] Use std::tie for a lexicographical comparison (NFC) (#160007)	Kazu Hirata	1	-3/+2

2025-09-21	[AMDGPU] Simplify template metaprogramming in IsMCExpr##member (NFC) (#160005)	Kazu Hirata	1	-8/+5
	Without this patch, we compute a type trait in a roundabout manner: - Compute a boolean value in the primary template. - Pass the value to std::enable_if_t. - Return std::true_type (or std::false_type on the fallback path). - Compare the return type to std::true_type. That is, when the expression for the first boolean value above is well formed, we already have the answer we are looking for. This patch bypasses the entire sequence by having the primary template return std::bool_constant and adjusting RESULT to extract the ::value of the boolean type.
2025-09-21	[RISCV][NFC] Parsed Immediates are Expressions (#159888)	Sam Elliott	1	-110/+113
	I find it very confusing that we have two different kinds of "immediates": - MCOperands in the backend that are `isImm()` which can only be numbers - RISCVOperands in the parser that are `isImm()` which can contain expressions This change aims to make it clearer that in the AsmParser, we are dealing with expressions, rather than just numbers. Unfortunately, `isImm` comes from the `MCParsedAsmOperand`, which is needed for Microsoft Inline Asm, so we cannot fully get rid of it.
2025-09-21	[DAG] Skip `mstore` combine for `<1 x ty>` vectors (#159915)	Abhishek Kaushik	1	-0/+6
	Fixes #159912
2025-09-21	[LV] Set correct costs for interleave group members.	Florian Hahn	1	-3/+12
	This ensures each scalarized member has an accurate cost, matching the cost it would have if it would not have been considered for an interleave group.
2025-09-21	[llvm][Analysis] Silence warning when building with MSVC	Alexandre Ganea	1	-1/+2
	When building an assert-enabled target, silence the following: ``` C:\git\llvm-project\llvm\include\llvm/Analysis/DependenceAnalysis.h(290): warning C4018: '<=': signed/unsigned mismatch ```
2025-09-21	[LV] Skip select cost for invariant divisors in legacy cost model.	Florian Hahn	1	-8/+10
	For UDiv/SDiv with invariant divisors, the created selects will be hoisted out. Don't compute their cost for each iteration, to match the more accurate VPlan-based cost modeling. Fixes https://github.com/llvm/llvm-project/issues/159402.
2025-09-21	[VPlanPatternMatch] Introduce m_ConstantInt (#159558)	Ramkumar Ramachandra	2	-5/+33

2025-09-21	[GlobalISel] Add G_ABS computeKnownBits (#154413)	Pragyansh Chaturvedi	1	-0/+8
	The code is taken from `SelectionDAG::computeKnownBits`. This ticks off ABS from #150515
2025-09-21	[LV] Also handle non-uniform scalarized loads when processing AddrDefs.	Florian Hahn	1	-2/+5
	Loads of addresses are scalarized and have their costs computed w/o scalarization overhead. Consistently apply this logic also to non-uniform loads that are already scalarized, to ensure their costs are consistent with other scalarized lodas that are used as addresses.
2025-09-20	[InstCombine][nfc] Fix assert failure with function entry count equal to zero	Alan Zhao	1	-12/+13
	We were hitting an assert discovered in https://github.com/llvm/llvm-project/pull/157768#issuecomment-3315359832
2025-09-20	[BPF] Avoid generating .comment section (#159958)	yonghong-song	1	-0/+1
	The kernel libbpf does not need .comment section. If not filtering out in llvm, the section will be filtered out in libbpf. So let us filter it out as early as possible which is in llvm. The following is an example. $ cat t.c int test() { return 5; } Without this change: $ llvm-readelf -S t.o [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .strtab STRTAB 0000000000000000 000110 000047 00 0 0 1 [ 2] .text PROGBITS 0000000000000000 000040 000010 00 AX 0 0 8 [ 3] .comment PROGBITS 0000000000000000 000050 000072 01 MS 0 0 1 [ 4] .note.GNU-stack PROGBITS 0000000000000000 0000c2 000000 00 0 0 1 [ 5] .llvm_addrsig LLVM_ADDRSIG 0000000000000000 000110 000000 00 E 6 0 1 [ 6] .symtab SYMTAB 0000000000000000 0000c8 000048 18 1 2 8 With this change: $ llvm-readelf -S t.o [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .strtab STRTAB 0000000000000000 000098 00003e 00 0 0 1 [ 2] .text PROGBITS 0000000000000000 000040 000010 00 AX 0 0 8 [ 3] .note.GNU-stack PROGBITS 0000000000000000 000050 000000 00 0 0 1 [ 4] .llvm_addrsig LLVM_ADDRSIG 0000000000000000 000098 000000 00 E 5 0 1 [ 5] .symtab SYMTAB 0000000000000000 000050 000048 18 1 2 8
2025-09-20	[IR] Fix a few implicit conversions from TypeSize to uint64_t. NFC (#159894)	Craig Topper	4	-6/+10

2025-09-20	[Object] Add a missing space to a diagnostic (#159826)	Nico Weber	1	-1/+1
	Follow-up to https://reviews.llvm.org/D46527
2025-09-20	[IR] Simplify dispatchRecalculateHash and dispatchResetHash (NFC) (#159903)	Kazu Hirata	1	-4/+2
	This patch simplifies dispatchRecalculateHash and dispatchResetHash with "constexpr if". This patch does not inline dispatchRecalculateHash and dispatchResetHash into their respective call sites. Using "constexpr if" in a non-template context like MDNode::uniquify would still require the discarded branch to be syntactically valid, causing a compilation error for node types that do not have recalculateHash/setHash. Using template functions ensures that the "constexpr if" is evaluated in a proper template context, allowing the compiler to fully discard the inactive branch.
2025-09-20	[IR] Modernize HasCachedHash (NFC) (#159902)	Kazu Hirata	1	-7/+3
	This patch modernizes HasCachedHash. - "struct SFINAE" is replaced with identically defined SameType. - The return types Yes and No are replaced with std::true_type and std::false_type. My previous attempt (#159510) to clean up HasCachedHash failed on clang++-18, but this version works with clang++-18.
2025-09-20	[MemProf] Propagate function call assignments to newly cloned nodes (#159907)	Teresa Johnson	1	-12/+22
	There are a couple of places during function cloning where we may create new callsite clone nodes. One of those places was correctly propagating the assignment to which function clone it should call, and one was not. Refactor this handling into a helper and use in both places so the newly created callsite clones actually call the assigned callee function clones.
2025-09-20	[InstCombine] Generalise optimisation of redundant floating point ↵	Rajveer Singh Bharadwaj	1	-28/+43
	comparisons with `ConstantFPRange` (#159315) Follow up of #158097 Similar to `simplifyAndOrOfICmpsWithConstants`, we can do so for floating point comparisons.
2025-09-20	[ValueTracking] a - b == NonZero -> a != b (#159792)	Yingwei Zheng	1	-1/+21
	Alive2: https://alive2.llvm.org/ce/z/8rX5Rk Closes https://github.com/llvm/llvm-project/issues/118106.
2025-09-19	[RISCV] Fix typo in comment. NFC	Craig Topper	1	-1/+1

2025-09-20	PPC: Fix regression for 32-bit ppc with 64-bit support (#159893)	Matt Arsenault	1	-1/+1
	Fixes regression after e5bbaa9c8fb6e06dbcbd39404039cc5d31df4410. e5500 accidentally still had the 64bit feature applied instead of 64bit-support.
2025-09-19	[CodeGen] Untangle RegisterCoalescer from LRE's ScannedRemattable flag [nfc[ ↵	Philip Reames	2	-17/+9
	(#159839) LiveRangeEdit's rematerialization checking logic is used in two quite different ways. For SplitKit and InlineSpiller, we're analyzing all defs associated with a live interval, doing that analysis up front, and then using the result a bit later. The RegisterCoalescer, we're analysing exactly one ValNo at a time, and using the legality result immediately. LRE had a checkRematerializable which existed basically to adapt the later into the former usage model. Instead, this change bypasses the ScannedRemat and Remattable structures, and directly queries the underlying routines. This is easy to read, and makes it more clear as to which uses actually need the deferred analysis. (A following change may try to unwind that too, but it's not strictly NFC.)
2025-09-19	Revert "[ELF][LLDB] Add an nvsass triple (#159459)" (#159879)	Joseph Huber	4	-19/+3
	Summary: This patch has broken the `libc` build bot. I could work around that but the changes seem unnecessary. This reverts commit 9ba844eb3a21d461c3adc7add7691a076c6992fc.
2025-09-20	X86: Elide use of RegClassByHwMode in some ptr_rc_tailcall uses (#159874)	Matt Arsenault	2	-4/+4
	Different instructions are used for the 32-bit and 64-bit cases anyway, so directly use the concrete register class in the instruction.
2025-09-20	[M68k] Remove STI from M68kAsmParser (#159827)	Sergei Barannikov	1	-3/+2
	STI exists in the base class, use it instead. Fixes #159862.
2025-09-19	Reland [BasicBlockUtils] Handle funclets when detaching EH pad blocks (#159379)	Gábor Spaits	1	-28/+69
	Fixes #148052 . Last PR did not account for the scenario, when more than one instruction used the `catchpad` label. In that case I have deleted uses, which were already "choosen to be iterated over" by the early increment iterator. This issue was not visible in normal release build on x86, but luckily later on the address sanitizer build it has found it on the buildbot. Here is the diff from the last version of this PR: #158435 ```diff diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp index 91e245e5e8f5..1dd8cb4ee584 100644 --- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp +++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp @@ -106,7 +106,8 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock > BBs, // first block, the we would have possible cleanupret and catchret // instructions with poison arguments, which wouldn't be valid. if (isa<FuncletPadInst>(I)) { - for (User User : make_early_inc_range(I.users())) { + SmallPtrSet<BasicBlock , 4> UniqueEHRetBlocksToDelete; + for (User User : I.users()) { Instruction ReturnInstr = dyn_cast<Instruction>(User); // If we have a cleanupret or catchret block, replace it with just an // unreachable. The other alternative, that may use a catchpad is a @@ -114,33 +115,12 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock > BBs, if (isa<CatchReturnInst>(ReturnInstr) \|\| isa<CleanupReturnInst>(ReturnInstr)) { BasicBlock ReturnInstrBB = ReturnInstr->getParent(); - // This catchret or catchpad basic block is detached now. Let the - // successors know it. - // This basic block also may have some predecessors too. For - // example the following LLVM-IR is valid: - // - // [cleanuppad_block] - // \| - // [regular_block] - // \| - // [cleanupret_block] - // - // The IR after the cleanup will look like this: - // - // [cleanuppad_block] - // \| - // [regular_block] - // \| - // [unreachable] - // - // So regular_block will lead to an unreachable block, which is also - // valid. There is no need to replace regular_block with unreachable - // in this context now. - // On the other hand, the cleanupret/catchret block's successors - // need to know about the deletion of their predecessors. - emptyAndDetachBlock(ReturnInstrBB, Updates, KeepOneInputPHIs); + UniqueEHRetBlocksToDelete.insert(ReturnInstrBB); } } + for (BasicBlock EHRetBB : + make_early_inc_range(UniqueEHRetBlocksToDelete)) + emptyAndDetachBlock(EHRetBB, Updates, KeepOneInputPHIs); } } ```
2025-09-19	[MCA] Enable customization of individual instructions (#155420)	Roman Belenov	2	-3/+48
	Currently MCA takes instruction properties from scheduling model. However, some instructions may execute differently depending on external factors - for example, latency of memory instructions may vary differently depending on whether the load comes from L1 cache, L2 or DRAM. While MCA as a static analysis tool cannot model such differences (and currently takes some static decision, e.g. all memory ops are treated as L1 accesses), it makes sense to allow manual modification of instruction properties to model different behavior (e.g. sensitivity of code performance to cache misses in particular load instruction). This patch addresses this need. The library modification is intentionally generic - arbitrary modifications to InstrDesc are allowed. The tool support is currently limited to changing instruction latencies (single number applies to all output arguments and MaxLatency) via coments in the input assembler code; the format is the like this: add (%eax), eax // LLVM-MCA-LATENCY:100 Users of MCA library can already make additional customizations; command line tool can be extended in the future. Note that InstructionView currently shows per-instruction information according to scheduling model and is not affected by this change. See https://github.com/llvm/llvm-project/issues/133429 for additional clarifications (including explanation why existing customization mechanisms do not provide required functionality) --------- Co-authored-by: Min-Yih Hsu <min@myhsu.dev>
2025-09-19	[SampleProfile] Always use FAM to get ORE	Aiden Grossman	1	-14/+9
	The split in this code path was left over from when we had to support the old PM and the new PM at the same time. Now that the legacy pass has been dropped, this simplifies the code a little bit and swaps pointers for references in a couple places. Reviewers: aeubanks, efriedma-quic, wlei-llvm Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/159858
2025-09-19	[RISCV] Update comments in RISCVMatInt to reflect we don't always use ADDIW ↵	Craig Topper	1	-14/+15
	after LUI now. NFC (#159829) The simm32 base case only uses lui+addiw when necessary after 3d2650bdeb8409563d917d8eef70b906323524ef The worst case 8 instruction sequence doesn't leave a full 32 bits for the LUI+ADDI(W) after the 3 12-bit ADDI and SLLI pairs are created. So we will never generate LUI+ADDIW in the worst case sequence.
2025-09-19	[SROA] Use tree-structure merge to remove alloca (#152793)	Chengjun	1	-7/+306
	This patch introduces a new optimization in SROA that handles the pattern where multiple non-overlapping vector `store`s completely fill an `alloca`. The current approach to handle this pattern introduces many `.vecexpand` and `.vecblend` instructions, which can dramatically slow down compilation when dealing with large `alloca`s built from many small vector `store`s. For example, consider an `alloca` of type `<128 x float>` filled by 64 `store`s of `<2 x float>` each. The current implementation requires: - 64 `shufflevector`s( `.vecexpand`) - 64 `select`s ( `.vecblend` ) - All operations use masks of size 128 - These operations form a long dependency chain This kind of IR is both difficult to optimize and slow to compile, particularly impacting the `InstCombine` pass. This patch introduces a tree-structured merge approach that significantly reduces the number of operations and improves compilation performance. Key features: - Detects when vector `store`s completely fill an `alloca` without gaps - Ensures no loads occur in the middle of the store sequence - Uses a tree-based approach with `shufflevector`s to merge stored values - Reduces the number of intermediate operations compared to linear merging - Eliminates the long dependency chains that hurt optimization Example transformation: ``` // Before: (stores do not have to be in order) %alloca = alloca <8 x float> store <2 x float> %val0, ptr %alloca ; offset 0-1 store <2 x float> %val2, ptr %alloca+16 ; offset 4-5 store <2 x float> %val1, ptr %alloca+8 ; offset 2-3 store <2 x float> %val3, ptr %alloca+24 ; offset 6-7 %result = load <8 x float>, ptr %alloca // After (tree-structured merge): %shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> ``` Benefits: - Logarithmic depth (O(log n)) instead of linear dependency chains - Fewer total operations for large vectors - Better optimization opportunities for subsequent passes - Significant compilation time improvements for large vector patterns For some large cases, the compile time can be reduced from about 60s to less than 3s. --------- Co-authored-by: chengjunp <chengjunp@nividia.com>
2025-09-19	[DependenceAnalysis] Extending SIV to handle fusable loops (#128782)	Alireza Torabian	1	-158/+303
	When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.