riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-09-02	support branch hint for AtomicExpandImpl::expandAtomicCmpXchg (#152366)	zhijian lin	1	-3/+7
	The patch add branch hint for AtomicExpandImpl::expandAtomicCmpXchg, For example: in PowerPC, it support branch hint as ``` loop: lwarx r6,0,r3 # load and reserve cmpw r4,r6 #1st 2 operands equal? bne- exit #skip if not bne- exit #skip if not stwcx. r5,0,r3 #store new value if still res’ved bne- loop #loop if lost reservation bne- loop #loop if lost reservation exit: mr r4,r6 #return value from storage ``` `-` hints not taken, `+` hints taken,
2025-08-28	[CodeGen][TLI] Allow targets to custom expand atomic load/stores (#154708)	Pierre van Houtryve	1	-5/+11
	Loads didn't have the `Expand` option in `AtomicExpandPass`. Stores had `Expand` but it didn't defer to TLI and instead did an action directly. Add a `CustomExpand` option and make it always map to the TLI hook for all cases. The `Expand` option now refers to a generic expansion for all targets.
2025-08-08	[IR] Remove size argument from lifetime intrinsics (#150248)	Nikita Popov	1	-7/+6
	Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).
2025-07-09	AtomicExpand: Stop using report_fatal_error (#147300)	Matt Arsenault	1	-3/+14
	Emit a context error and delete the instruction. This allows removing the AMDGPU hack where some atomic libcalls are falsely added. NVPTX also later copied the same hack, so remove it there too. For now just emit the generic error, which is not good. It's missing any useful context information (despite taking the instruction). It's also confusing in the failed atomicrmw case, since it's reporting failure at the intermediate failed cmpxchg instead of the original atomicrmw.
2025-06-08	[AtomicExpandPass] Match isIdempotentRMW with InstcombineRMW (#142277)	AZero13	1	-3/+10
	Add umin, smin, umax, smax to isIdempotentRMW
2025-04-30	Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum ↵	Jonathan Thackray	1	-0/+4
	instructions (#137701) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.
2025-04-28	Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum ↵	Jonathan Thackray	1	-4/+0
	instructions" (#137657) Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4
2025-04-28	[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions ↵	Jonathan Thackray	1	-0/+4
	(#136759) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.
2025-02-24	[NVPTX] Support for memory orderings for cmpxchg (#126159)	Akshay Deodhar	1	-2/+4
	So far, all cmpxchg instructions were lowered to atom.cas. This change adds support for memory orders in lowering. Specifically: - For cmpxchg which are emulated, memory ordering is enforced by adding fences around the emulation loops. - For cmpxchg which are lowered to PTX directly, where the memory order is supported in ptx, lower directly to the correct ptx instruction. - For seq_cst cmpxchg which are lowered to PTX directly, use a sequence (fence.sc; atom.cas.acquire) to provide the semantics that we want. Also adds tests for all possible combinations of (size, memory ordering, address space, SM/PTX versions) This also adds `atomicOperationOrderAfterFenceSplit` in TargetLowering, for specially handling seq_cst atomics.
2024-11-13	AtomicExpand: Preserve metadata when bitcasting fp atomicrmw xchg (#115240)	Matt Arsenault	1	-0/+1

2024-11-12	[CodeGen] Remove unused includes (NFC) (#115996)	Kazu Hirata	1	-2/+0
	Identified with misc-include-cleaner.
2024-11-04	AMDGPU: Custom expand flat cmpxchg which may access private (#109410)	Matt Arsenault	1	-0/+4
	64-bit flat cmpxchg instructions do not work correctly for scratch addresses, and need to be expanded as non-atomic. Allow custom expansion of cmpxchg in AtomicExpand, as is already the case for atomicrmw.
2024-10-31	AtomicExpand: Copy metadata from atomicrmw to cmpxchg (#109409)	Matt Arsenault	1	-40/+51
	When expanding an atomicrmw with a cmpxchg, preserve any metadata attached to it. This will avoid unwanted double expansions in a future commit. The initial load should also probably receive the same metadata (which for some reason is not emitted as an atomic).
2024-09-20	AtomicExpand: Really allow incremental legalization (#108613)	Matt Arsenault	1	-10/+3
	Fix up 100d9b89947bb1d42af20010bb594fa4c02542fc. The iterator fixes ended up defeating the point, since newly inserted blocks were not visited. This never erases the current block, so we can simply not preincrement the block iterator. The AArch64 FP atomic tests now expand the cmpxchg in the second round of legalization.
2024-09-06	Add usub_cond and usub_sat operations to atomicrmw (#105568)	anjenner	1	-2/+6
	These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.
2024-09-06	Reapply "AtomicExpand: Allow incrementally legalizing atomicrmw" (#107307)	Matt Arsenault	1	-11/+24
	This reverts commit 63da545ccdd41d9eb2392a8d0e848a65eb24f5fa. Use reverse iteration in the instruction loop to avoid sanitizer errors. This also has the side effect of avoiding the AArch64 codegen quality regressions. Closes #107309
2024-09-04	Revert "Reland "AtomicExpand: Allow incrementally legalizing atomicrmw"" ↵	Vitaly Buka	1	-24/+11
	(#107307) Reverts llvm/llvm-project#106793 `Next == E` is not enough: https://lab.llvm.org/buildbot/#/builders/169/builds/2834 `Next` is deleted by `processAtomicInstr`
2024-09-04	Reland "Revert "AtomicExpand: Allow incrementally legalizing atomicrmw"" ↵	Vitaly Buka	1	-11/+24
	(#106793) Reverts llvm/llvm-project#106792 The first commit of PR is pure revert, the rest is a possible fix.
2024-08-30	Revert "AtomicExpand: Allow incrementally legalizing atomicrmw" (#106792)	Vitaly Buka	1	-24/+11
	Reverts llvm/llvm-project#103371 There is `heap-use-after-free`, commented on 206b5aff44a95754f6dd7a5696efa024e983ac59 Maybe `if (Next == E \|\| BB != Next->getParent()) {` is enough, but not sure, what was the intent there,
2024-08-30	AtomicExpand: Allow incrementally legalizing atomicrmw (#103371)	Matt Arsenault	1	-11/+24
	If a lowering changed control flow, resume the legalization loop at the first newly inserted block. This will allow incrementally legalizing atomicrmw and cmpxchg. The AArch64 test might be a bugfix. Previously it would lower the vector FP case as a cmpxchg loop, but cmpxchgs get lowered but previously weren't. Maybe it shouldn't be reporting cmpxchg for the expand type in the first place though.
2024-08-14	AtomicExpand: Add assert that atomicrmw is an xchg	Matt Arsenault	1	-0/+2
	It turns out it's trivial to hit this path with any rmw operation.
2024-08-13	AtomicExpand: Refactor atomic instruction handling (#102914)	Matt Arsenault	1	-125/+143
	Move the processing of an instruction into a helper function. Also avoid redundant checking for all types of atomic instructions. Including the assert, it was effectively performing the same check 3 times.
2024-08-02	[llvm] Make InstSimplifyFolder constructor explicit (NFC) (#101654)	Sergei Barannikov	1	-1/+1

2024-07-20	Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped ↵	Joseph Huber	1	-1/+1
	(#98512)" This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.
2024-07-20	Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped ↵	NAKAMURA Takumi	1	-1/+1
	(#98512)" This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610
2024-07-16	[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)	Joseph Huber	1	-1/+1
	Summary: The LTO pass and LLD linker have logic in them that forces extraction and prevent internalization of needed runtime calls. However, these currently take all RTLibcalls into account, even if the target does not support them. The target opts-out of a libcall if it sets its name to nullptr. This patch pulls this logic out into a class in the header so that LTO / lld can use it to determine if a symbol actually needs to be kept. This is important for targets like AMDGPU that want to be able to use `lld` to perform the final link step, but does not want the overhead of uncalled functions. (This adds like a second to the link time trivially)
2024-06-28	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)	Nikita Popov	1	-2/+2
	Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.
2024-06-27	[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)	Nikita Popov	1	-5/+5
	This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.
2024-06-24	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"	Stephen Tozer	1	-4/+4
	Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24	[IR][NFC] Update IRBuilder to use InsertPosition (#96497)	Stephen Tozer	1	-4/+4
	Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.
2024-06-12	AtomicExpand: Fix creating invalid ptrmask for fat pointers (#94955)	Matt Arsenault	1	-1/+1
	The ptrmask intrinsic requires the integer mask to be the index size, not the pointer size.
2024-05-23	AtomicExpand: Preserve metadata when expanding partword RMW (#89769)	Matt Arsenault	1	-1/+33
	This will be important for AMDGPU in a future patch.
2024-05-07	AMDGPU: Do not bitcast atomicrmw in IR (#90045)	Matt Arsenault	1	-2/+3
	This is the first step to eliminating shouldCastAtomicRMWIInIR. This and the other atomic expand casting hooks should be removed. This adds duplicate legalization machinery and interfaces. This is already what codegen is supposed to do, and already does for the promotion case. In the case of atomicrmw xchg, there seems to be some benefit to having the bitcasts moved outside of the cmpxchg loop on targets with separate int and FP registers, which we should be able to deal with by directly checking for the legality of the underlying operation. The casting path was also losing metadata when it recreated the instruction.
2024-04-24	AtomicExpand: Fix dropping a syncscope when bitcasting atomicrmw	Matt Arsenault	1	-2/+3

2024-04-24	[IR] Memory Model Relaxation Annotations (#78569)	Pierre van Houtryve	1	-2/+15
	Implements the core/target-agnostic components of Memory Model Relaxation Annotations. RFC: https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5
2024-04-23	AtomicExpand: Emit or with constant on RHS	Matt Arsenault	1	-1/+1
	This will save later code from commuting it.
2024-04-06	[RFC] IR: Support atomicrmw FP ops with vector types (#86796)	Matt Arsenault	1	-3/+3
	Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x bfloat> on some targets and address spaces. Note this only supports the proper floating-point operations; float vector typed xchg is still not supported. cmpxchg still only supports integers, so this inserts bitcasts for the loop expansion. I have support for fp vector typed xchg, and vector of int/ptr separately implemented but I don't have an immediate need for those beyond feature consistency.
2024-03-29	[FPEnv][AtomicExpand] Correct strictfp attribute handling in ↵	Kevin P. Neal	1	-0/+3
	AtomicExpandPass (#87082) The AtomicExpand pass was lowering function calls with the strictfp attribute to sequences that included function calls incorrectly lacking the attribute. This patch corrects that. The pass now also emits the correct constrained fp call instead of normal FP instructions when in a function with the strictfp attribute. Test changes verified with D146845.
2024-02-25	[CodeGen] Port AtomicExpand to new Pass Manager (#71220)	Rishabh Bali	1	-58/+85
	Port the `atomicexpand` pass to the new Pass Manager. Fixes #64559
2024-02-07	[AtomicExpand][RISCV] Call shouldExpandAtomicRMWInIR before ↵	Craig Topper	1	-16/+23
	widenPartwordAtomicRMW (#80947) This gives the target a chance to keep an atomicrmw op that is smaller than the minimum cmpxchg size. This is needed to support the Zabha extension for RISC-V which provides i8/i16 atomicrmw operations, but does not provide an i8/i16 cmpxchg or LR/SC instructions. This moves the widening until after the target requests LLSC/CmpXChg/MaskedIntrinsic expansion. Once we widen, we call shouldExpandAtomicRMWInIR again to give the target another chance to make a decision about the widened operation. I considered making the targets return AtomicExpansionKind::Expand or a new expansion kind for And/Or/Xor, but that required the targets to special case And/Or/Xor which they weren't currently doing.
2023-08-10	[llvm] Drop some bitcasts and references related to typed pointers	Bjorn Pettersson	1	-32/+11
	Differential Revision: https://reviews.llvm.org/D157551
2023-08-08	AtomicExpand: Preserve syncscope when expanding partword atomics	Matt Arsenault	1	-3/+4

2023-08-03	[llvm] Drop some typed pointer handling/bitcasts	Bjorn Pettersson	1	-12/+4
	Differential Revision: https://reviews.llvm.org/D157016
2023-07-11	AtomicExpand: Fix expanding atomics into unconstrained FP in strictfp functions	Matt Arsenault	1	-0/+5
	Ideally the normal fadd/fmin/fmax this was creating would fail the verifier. It's probably also necessary to force off FP exception handlers in the cmpxchg loop but we don't have a generic way to do that now. Note strictfp builder is broken in the minnum/maxnum case https://reviews.llvm.org/D154993
2023-01-24	IR: Add atomicrmw uinc_wrap and udec_wrap	Matt Arsenault	1	-1/+5
	These are essentially add/sub 1 with a clamping value. AMDGPU has instructions for these. CUDA/HIP expose these as atomicInc/atomicDec. Currently we use target intrinsics for these, but those do no carry the ordering and syncscope. Add these to atomicrmw so we can carry these and benefit from the regular legalization processes.
2023-01-23	[WoA] Use fences for sequentially consistent stores/writes	Nadeem, Usman	1	-1/+20
	LLVM currently uses LDAR/STLR and variants for acquire/release as well as seq_cst operations. This is fine as long as all code uses this convention. Normally LDAR/STLR act as one way barriers but when used in combination they provide a sequentially consistent model. i.e. when an LDAR appears after an STLR in program order the STLR acts as a two way fence and the store will be observed before the load. The problem is that normal loads (unlike ldar), when they appear after the STLR can be observed before STLR (if my understanding is correct). Possibly providing weaker than expected guarantees if they are used for ordered atomic operations. Unfortunately in Microsoft Visual Studio STL seq_cst ld/st are implemented using normal load/stores and explicit fences: dmb ish + str + dmb ish ldr + dmb ish This patch uses fences for MSVC target whenever we write to the memory in a sequentially consistent way so that we don't rely on the assumptions that just using LDAR/STLR will give us sequentially consistent ordering. Differential Revision: https://reviews.llvm.org/D141748 Change-Id: I48f3500ff8ec89677c9f089ce58181bd139bc68a
2023-01-05	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part	serge-sans-paille	1	-7/+7
	Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955
2022-11-24	[CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC]	Manuel Brito	1	-5/+5
	Differential Revision: https://reviews.llvm.org/D138483
2022-11-22	Revert "[CodeGen] Use poison instead of undef as placeholder in ↵	Nuno Lopes	1	-5/+5
	AtomicExpandPass [NFC]" This reverts commit f50423c1a4422900aa1240fed643f5920451a88d.
2022-11-22	[CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC]	Manuel Brito	1	-5/+5
	Differential Revision: https://reviews.llvm.org/D138483