riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-09-12	CodeGen: Remove MachineFunction argument from getRegClass (#158188)	Matt Arsenault	1	-2/+2
	This is a low level utility to parse the MCInstrInfo and should not depend on the state of the function.
2025-09-03	AMDGPU: Try to constrain av registers to VGPR to enable ds_write2 formation ↵	Matt Arsenault	1	-18/+57
	(#156400) In future changes we will have more AV_ virtual registers, which currently block the formation of write2. Most of the time these registers can simply be constrained to VGPR, so do that. Also relaxes the constraint in flat merging case. We already have the necessary code to insert copies to the original result registers, so there's no point in avoiding it. Addresses the easy half of #155769
2025-08-29	AMDGPU: Add debug print to load/store opt for agpr case (#155767)	Matt Arsenault	1	-0/+3

2025-08-20	[AMDGPU] Support merging 16-bit and 8-bit TBUFFER load/store instruction ↵	Harrison Hao	1	-11/+56
	(#145078) SILoadStoreOptimizer can now recognise consecutive 16-bit and 8-bit `TBUFFER_LOAD`/`TBUFFER_STORE` instructions that each write * a single component (`X`), or * two components (`XY`), and fold them into the wider native variants: ``` X + X --> XY X + X + X + X --> XYZW XY + XY --> XYZW X + X + X --> XYZ XY + X --> XYZ ``` The optimisation cuts the number of TBUFFER instructions, shrinking code size and improving memory throughput.
2025-08-18	[AMDGPU] Support merging of flat GVS ops (#154200)	Stanislav Mekhanoshin	1	-0/+62

2025-07-21	[AMDGPU] Prohibit load/store merge if scale_offset is set on gfx1250 (#149895)	Stanislav Mekhanoshin	1	-1/+4
	Scaling is done on the operation size, by merging instructions we would need to generate code to scale the offset and reset the auto-scale bit. This is unclear if that would be beneficial, just disable such merge for now.
2025-07-17	[AMDGPU] Remove an unnecessary cast (NFC) (#149254)	Kazu Hirata	1	-2/+1
	getTargetLowering() already returns const SITargetLowering *.
2025-05-23	[NFC][CodeGen] Adopt MachineFunctionProperties convenience accessors (#141101)	Rahul Joshi	1	-2/+1

2025-02-12	[TableGen] Emit OpName as an enum class instead of a namespace (#125313)	Rahul Joshi	1	-10/+10
	- Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).
2025-02-06	[AMDGPU] Avoid repeated hash lookups (NFC) (#126001)	Kazu Hirata	1	-6/+8

2024-10-03	[AMDGPU] Qualify auto. NFC. (#110878)	Jay Foad	1	-1/+1
	Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)
2024-09-11	[AMDGPU] Make more use of getWaveMaskRegClass. NFC. (#108186)	Jay Foad	1	-1/+1

2024-09-02	[CodeGen] Update a few places that were passing Register to ↵	Craig Topper	1	-2/+3
	raw_ostream::operator<< (#106877) These would implicitly cast the register to `unsigned`. Switch most of them to use printReg will give a more readable output. Change some others to use Register::id() so we can eventually remove the implicit cast to `unsigned`.
2024-09-02	AMDGPU/NewPM Port SILoadStoreOptimizer to NPM (#106362)	Akshat Oke	1	-17/+47

2024-08-12	[AMDGPU] add missing checks in processBaseWithConstOffset (#102310)	Tim Gymnich	1	-0/+6
	fixes https://github.com/llvm/llvm-project/issues/102231 by inserting missing checks.
2024-08-06	[AMDGPU][SILoadStoreOptimizer] Include constrained buffer load variants ↵	Christudasan Devadasan	1	-12/+69
	(#101619) Use the constrained buffer load opcodes while combining under-aligned loads for XNACK enabled subtargets.
2024-07-23	[AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (#96162)	Christudasan Devadasan	1	-8/+40
	Consider the constrained multi-dword loads while merging individual loads to a single multi-dword load.
2024-06-06	[AMDGPU] Promote immediate offset to atomics (#94043)	Stanislav Mekhanoshin	1	-7/+0

2024-05-31	[AMDGPU] Enable constant offset promotion to immediate FLAT (#93884)	Stanislav Mekhanoshin	1	-4/+10
	Currently it is only supported for FLAT Global.
2024-05-30	[AMDGPU] Fix crash in the SILoadStoreOptimizer (#93862)	Stanislav Mekhanoshin	1	-1/+1
	It does not properly handle situation when address calculation uses V_ADDC_U32 0, 0, carry-in (i.e. with both src0 and src1 immediates).
2024-05-02	[AMDGPU] Use some merging/unmerging helpers in SILoadStoreOptimizer (#90866)	Jay Foad	1	-135/+76
	Factor out copyToDestRegs and copyFromSrcRegs for merging store sources and unmerging load results. NFC.
2024-05-02	[AMDGPU] Modernize some syntax in SILoadStoreOptimizer. NFC.	Jay Foad	1	-32/+17
	Use structured bindings and similar.
2024-05-01	[AMDGPU] Remove some pointless fallthrough annotations	Jay Foad	1	-6/+6

2024-03-25	[AMDPU] Add support for idxen and bothen buffer load/store merging in ↵	David Stuttard	1	-0/+16
	SILoadStoreOptimizer (#86285) Added more buffer instruction merging support
2024-03-17	[CodeGen] Use LocationSize for MMO getSize (#84751)	David Green	1	-1/+1
	This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.
2023-12-15	[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492)	Mirko Brkušanin	1	-4/+24

2023-12-15	[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488)	Mirko Brkušanin	1	-10/+13

2023-12-15	[AMDGPU] CodeGen for SMEM instructions (#75579)	Mirko Brkušanin	1	-2/+37

2023-08-11	[AMDGPU] Add sanity check that fixes bad shift operation in AMD backend	Konrad Kusiak	1	-0/+3
	There is a problem with the SILoadStoreOptimizer::dmasksCanBeCombined() function that can lead to UB. This boolean function decides if two masks can be combined into 1. The idea here is that the bits which are "on" in one mask, don't overlap with the "on" bits of the other. Consider an example (10 bits for simplicity): Mask 1: 0101101000 Mask 2: 0000000110 Those can be combined into a single mask: 0101101110. To check if such an operation is possible, the code takes the mask which is greater and counts how many 0s there are, starting from the LSB and stopping at the first 1. Then, it shifts 1u by this number and compares it with the smaller mask. The problem is that when both masks are 0, the counter will find 32 zeroes in the first mask and will try to do a shift by 32 positions which leads to UB. The fix is a simple sanity check, if the bigger mask is 0 or not. https://reviews.llvm.org/D155051
2023-06-21	[AMDGPU] Minor refactoring in SILoadStoreOptimizer::offsetsCanBeCombined	Jay Foad	1	-3/+6

2023-04-25	[AMDGPU] Do not handle _SGPR SMEM instructions in SILoadStoreOptimizer	Jay Foad	1	-31/+5
	After D147334 we never select _SGPR forms of SMEM instructions on subtargets that also support the _SGPR_IMM form, so there is no need to handle them here. Differential Revision: https://reviews.llvm.org/D149139
2023-04-10	[AMDGPU] Extend tbuffer_load_format merge	mmarjano	1	-0/+4
	Add support for merging _IDXEN and _BOTHEN variants of TBUFFER_LOAD_FORMAT instruction.
2023-03-14	[Target] Use *{Set,Map}::contains (NFC)	Kazu Hirata	1	-2/+2

2023-01-28	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	Kazu Hirata	1	-2/+2

2023-01-22	Use llvm::popcount instead of llvm::countPopulation(NFC)	Kazu Hirata	1	-4/+5

2022-12-14	[AMDGPU] Stop using make_pair and make_tuple. NFC.	Jay Foad	1	-3/+3
	C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828
2022-12-13	[CodeGen] llvm::Optional => std::optional	Fangrui Song	1	-2/+2

2022-12-02	[Target] Use std::nullopt instead of None (NFC)	Kazu Hirata	1	-2/+2
	This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-14	[AMDGPU][MC] Support TFE modifiers in MUBUF loads and stores.	Ivan Kosarev	1	-4/+0
	Reviewed By: dp, arsenm Differential Revision: https://reviews.llvm.org/D137783
2022-11-08	[AMDGPU] Add & use `hasNamedOperand`, NFC	Pierre van Houtryve	1	-3/+3
	In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540
2022-09-15	[AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads.	Ivan Kosarev	1	-10/+77
	Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D133787
2022-08-08	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC	Fangrui Song	1	-7/+7
	With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-07-30	[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions	Carl Ritson	1	-5/+38
	Apply merging to s_load as is done for s_buffer_load. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130742
2022-03-09	[AMDGPU] Merge flat with global in the SILoadStoreOptimizer	Stanislav Mekhanoshin	1	-30/+53
	Flat can be merged with flat global since address cast is a no-op. A combined memory operation needs to be promoted to flat. Differential Revision: https://reviews.llvm.org/D120431
2022-02-28	[AMDGPU] Extend SILoadStoreOptimizer to handle flat load/stores	Stanislav Mekhanoshin	1	-9/+71
	TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120351
2022-02-24	[AMDGPU] Extend SILoadStoreOptimizer to handle global stores	Stanislav Mekhanoshin	1	-1/+111
	TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120346
2022-02-24	[AMDGPU] Fix combined MMO in load-store merge	Stanislav Mekhanoshin	1	-54/+36
	Loads and stores can be out of order in the SILoadStoreOptimizer. When combining MachineMemOperands of two instructions operands are sent in the IR order into the combineKnownAdjacentMMOs. At the moment it picks the first operand and just replaces its offset and size. This essentially loses alignment information and may generally result in an incorrect base pointer to be used. Use a base pointer in memory addresses order instead and only adjust size. Differential Revision: https://reviews.llvm.org/D120370
2022-02-22	[AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads	Stanislav Mekhanoshin	1	-1/+41
	This adds handling of the _SADDR forms to the GLOBAL_LOAD combining. TODO: merge global stores. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120285
2022-02-22	[AMDGPU] Extend SILoadStoreOptimizer to handle global loads	Stanislav Mekhanoshin	1	-0/+83
	There can be situations where global and flat loads and stores are not combined by the vectorizer, in particular if their address space differ in the IR but they end up the same class instructions after selection. For example a divergent load from constant address space ends up being the same global_load as a load from global address space. TODO: merge global stores. TODO: handle SADDR forms. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120279
2022-02-21	[AMDGPU] Remove redundand check in the SILoadStoreOptimizer	Stanislav Mekhanoshin	1	-2/+1
	Differential Revision: https://reviews.llvm.org/D120268