riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
9 hours	[AMDGPU] LRO: allow same-BB non-lookthrough users for PHI (#160909)	michaelselehov	1	-1/+4
	Loop headers frequently consume the loop-carried value in the header block via non-lookthrough ops (e.g. byte-wise vector binops). LiveRegOptimizer’s same-BB filter currently prunes these users, so the loop-carried PHI is not coerced to i32 and the intended packed form is lost. Relax the filter: when the def is a PHI, allow same-BB non-lookthrough users. Also fix the check to look at the user (CII) rather than the def (II) so the walk does not terminate prematurely.
9 hours	[AMDGPU][LowerBufferFatPointers] Erase dead ptr(7) intrinsics (#160798)	Krzysztof Drewniak	1	-1/+3
	Fix a crash that would arise when intrinsics like llvm.masked.load.T.p7 were left in the module when AMDGPULowerBufferFatPointers was applied and so a captures(none) annotation would be applied to a non-pointer value, triggering a verifier failure. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>
3 days	AMDGPU: Check if immediate is legal for av_mov_b32_imm_pseudo (#160819)	Matt Arsenault	1	-0/+9
	This is primarily to avoid folding a frame index materialized into an SGPR into the pseudo; this would end up looking like: %sreg = s_mov_b32 %stack.0 %av_32 = av_mov_b32_imm_pseudo %sreg Which is not useful. Match the check used for the b64 case. This is limited to the pseudo to avoid regression due to gfx908's special case - it is expecting to pass here with v_accvgpr_write_b32 for illegal cases, and stay in the intermediate state with an sgpr input. This avoids regressions in a future patch.
3 days	[AMDGPU][True16][CodeGen] Avoid setting hi part in copysign (#160891)	Piotr Sobczak	1	-2/+3
	This is a temporary fix for a regression from #154875. The new pattern sets the hi part of V_BFI result and that confuses si-fix-sgpr-copies - where the proper fix is likely to be. During si-fix-sgpr-copies, an incorrect fold happens: %86:vgpr_32 = V_BFI_B32_e64 %87:sreg_32 = COPY %86.hi16:vgpr_32 %95:vgpr_32 = nofpexcept V_PACK_B32_F16_t16_e64 0, killed %87:sreg_32, 0, %63:vgpr_16, 0, 0 into %86:vgpr_32 = V_BFI_B32_e64 %95:vgpr_32 = nofpexcept V_PACK_B32_F16_t16_e64 0, %86.lo16:vgpr_32, 0, %63:vgpr_16, 0, 0 Fixes: Vulkan CTS dEQP-VK.glsl.builtin.precision_fp16_storage32b.*.
3 days	[AMDGPU] Ensure divergence for v_alignbit (#129159)	Jeffrey Byrnes	1	-7/+7
	Selecting vgpr for the uniform version of this pattern may lead to unnecessary vgpr and waterfall loops.
3 days	[NFC][LLVM] Pass/return SMLoc by value instead of const reference (#160797)	Rahul Joshi	1	-13/+11
	SMLoc itself encapsulates just a pointer, so there is no need to pass or return it by reference.
4 days	[AMDGPU] Skip debug uses in SIInsertWaitcnts::shouldFlushVmCnt (#160818)	Jay Foad	1	-1/+1

4 days	[AMDGPU] Avoid constraining RC based on folded into operand (NFC) (#160743)	Josh Hutton	1	-4/+9
	The RC of the folded operand does not need to be constrained based on the RC of the current operand we are folding into. The purpose of this PR is to facilitate this PR: https://github.com/llvm/llvm-project/pull/151033
4 days	[AMDGPU] Calc IsVALU correctly during UADDO/USUBO selection (#159814)	LU-JOHN	2	-7/+14
	Fix two bugs. The first bug hid the second bug. 1. Calculate IsVALU correctly during UADDO/USUBO selection. IsVALU should be false if the carryout users are UADDO_CARRY/USUBO_CARRY. However instruction selection visits uses before defs, so the UADDO_CARRY/USUBO_CARRY nodes are normally (probably always) already converted to S_ADD_CO_PSEUDO/S_SUB_CO_PSEUDO. Fix to check for these machine opcodes. 2. Without this fix, UADDO/USUBO selection will always select the VALU instructions V_ADD_CO__U32_e64/V_SUB_CO_U32_e64. S_UADDO_PSEUDO/S_USUBO_PSEUDO were never selected in the CodeGen/AMDGPU tests. Thus, S_UADDO_PSEUDO/S_USUBO_PSEUDO cases were never hit in EmitInstrWithCustomInserter. The code generation for S_UADDO_PSEUDO/S_USUBO_PSEUDO had a bug where it could not handle code generation for 32-bit $scc_out. --------- Signed-off-by: John Lu <John.Lu@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
4 days	[AMDGPU] Add GFX12 wave register names with WAVE_ prefix (#144352)	Aleksandar Spasojevic	1	-87/+79
	Rename canonical register names with WAVE_ prefix for GFX12 Maintain backward compatibility through aliases
5 days	AMDGPU: Ensure both wavesize features are not set (#159234)	Matt Arsenault	6	-16/+62
	Make sure we cannot be in a mode with both wavesizes. This prevents assertions in a future change. This should probably just be an error, but we do not have a good way to report errors from the MCSubtargetInfo constructor.
5 days	[AMDGPU] Fix vector legalization for bf16 valu ops (#158439)	Giuseppe Rossini	2	-6/+19
	Add v4,v8,v16,v32 legalizations for the following operations: - `FADD` - `FMUL` - `FMA` - `FCANONICALIZE`
5 days	[TII] Split isTrivialReMaterializable into two versions [nfc] (#160377)	Philip Reames	2	-8/+7
	This change builds on https://github.com/llvm/llvm-project/pull/160319 which tries to clarify which callers (not backends) assume that the result is actually trivial. This change itself should be NFC. Essentially, I'm just renaming the existing isTrivialRematerializable to the non-trivial version and then adding a new trivial version (with the same name as the prior function) and simplifying a few callers which want that semantic. This change does not enable non-trivial remat any more broadly than was already done for our targets which were lying through the old APIs; that will come separately. The goal here is simply to make the code easier to follow in terms of what assumptions are being made where. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>
5 days	[AMDGPU] Update comments in memory legalizer. NFC (#160453)	Stanislav Mekhanoshin	1	-5/+14

5 days	[AMDGPU][True16][CodeGen] true16 isel pattern for fma_mix_f16/bf16 (#159648)	Brox Chen	5	-50/+155
	This patch includes: 1. fma_mix inst takes fp16 type as input, but place the operand in vgpr32. Update selector to insert vgpr32 for true16 mode if necessary. 2. fma_mix inst returns fp16 type as output, but place the vdst in vgpr32. Create a fma_mix_t16 pesudo inst for isel pattern, and lower it to mix_lo/hi in the mc lowering pass. These stop isel from emitting illegal `vgpr32 = COPY vgpr16` and improve code quality
5 days	[AMDGPU] Add the support for 45-bit buffer resource (#159702)	Shilei Tian	4	-45/+109
	On new targets like `gfx1250`, the buffer resource (V#) now uses this format: ``` base (57-bit): resource[56:0] num_records (45-bit): resource[101:57] reserved (6-bit): resource[107:102] stride (14-bit): resource[121:108] ``` This PR changes the type of `num_records` from `i32` to `i64` in both builtin and intrinsic, and also adds the support for lowering the new format. Fixes SWDEV-554034. --------- Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
5 days	[NFC][AMDGPU] Refactor common declarations (#160406)	LU-JOHN	1	-20/+3
	Move common declarations from switch cases to function entry. Signed-off-by: John Lu <John.Lu@amd.com>
6 days	[AMDGPU][AsmParser] Introduce MC representation for lit() and lit64(). (#160316)	Ivan Kosarev	8	-81/+220
	And rework the lit64() support to use it. The rules for when to add lit64() can be simplified and improved. In this change, however, we just follow the existing conventions on the assembler and disassembler sides. In codegen we do not (and normally should not need to) add explicit lit() and lit64() modifiers, so the codegen tests lose them. The change is an NFCI otherwise. Simplifies printing operands.
6 days	[AMDGPU] SIMemoryLegalizer: Factor out check if memory operations can affect ↵	Fabian Ritter	1	-18/+31
	the global AS (#160129) Mostly NFC, and adds an assertion for gfx12 to ensure that no atomic scratch instructions are present in the case of GloballyAddressableScratch. This should always hold because of #154710.
6 days	[AMDGPU] Refine GCNHazardRecognizer hasHazard() (#138841)	Carl Ritson	1	-31/+106
	Remove recursion to avoid stack overflow on large CFGs. Avoid worklist for hazard search within single MachineBasicBlock. Ensure predecessors are visited for all state combinations.
6 days	[AMDGPU] SILowerControlFlow: ensure EXEC/SCC interval recompute (#160459)	Carl Ritson	1	-5/+4
	Ensure live intervals for EXEC and SCC are removed on all paths which generate instructions.
6 days	[AMDGPU] Handle S_GETREG_B32_const in the hazard recognizer. NFCI (#160364)	Stanislav Mekhanoshin	1	-1/+1

6 days	[AMDGPU] Support `xor cond, -1` when lowering `BRCOND` (#160341)	Shilei Tian	1	-3/+16
	This can happen when `xor cond, -1` is not combined.
6 days	[CodeGen] Rename isReallyTriviallyReMaterializable [nfc]	Philip Reames	2	-3/+3
	.. to isReMaterializableImpl. The "Really" naming has always been awkward, and we're working towards removing the "Trivial" part now, so go ehead and remove both pieces in a single rename. Note that this doesn't change any aspect of the current implementation; we still "mostly" only return instructions which are trivial (meaning no virtual register uses), but some targets do lie about that today.
6 days	[AMDGPU] Fix high vgpr printing with true16 (#160209)	Stanislav Mekhanoshin	2	-2/+19

6 days	[AMDGPU][AsmParser][NFC] Combine the Lit and Lit64 modifier flags. (#160315)	Ivan Kosarev	1	-34/+37
	They represent mutually exclusive values of the same attribute.
6 days	[AMDGPU] Fix sub-dword atomic flat saddr store with no D16. NFCI (#160253)	Stanislav Mekhanoshin	1	-2/+2
	The pattern does not factor saddr. There is no way to write a test for it because gfx1200 does not have sram-ecc but also no saddr, and gfx1250 does not fall into this preserving category while has sram-ecc. Nevertheless, the day we could fix it that would become a problem. For now it is OK that change does not fail. That was untested before and it is untested now, but at least t16 block uses t16 patterns.
6 days	Revert "[AMDGPU] Elide bitcast fold i64 imm to build_vector" (#160325)	Janek van Oirschot	3	-55/+1
	Reverts llvm/llvm-project#154115 Co-authored-by: ronlieb <ron.lieberman@amd.com>
6 days	[MCA] Use Bare Reference for InstrPostProcess (#160229)	Aiden Grossman	2	-7/+6
	This patch makes it so that InstrPostProcess::postProcessInstruction takes in a reference to a mca::Instruction rather than a reference to a std::unique_ptr. Without this, InstrPostProcess cannot be used with MCA instruction recycling because it needs to be called on both newly created instructions and instructions that have been recycled. We only have access to a raw pointer for instructions that have been recycled rather than a reference to the std::unique_ptr that owns them. This patch adds a call in the existing instruction recycling unit test to ensure the API remains compatible with this use case.
7 days	[AMDGPU] Insert waitcnt for non-global fence release in GFX12 (#159282)	Fabian Ritter	1	-38/+38
	A fence release could be followed by a barrier, so it should wait for the relevant memory accesses to complete, even if it is mmra-limited to LDS. So far, that would be skipped for non-global fence releases. Fixes SWDEV-554932.
7 days	[MIR] Support save/restore points with independent sets of registers (#119358)	Elizaveta Noskova	1	-2/+4
	This patch adds the MIR parsing and serialization support for save and restore points with subsets of callee saved registers. That is, it syntactically allows a function to contain two or more distinct sub-regions in which distinct subsets of registers are spilled/filled as callee save. This is useful if e.g. one of the CSRs isn't modified in one of the sub-regions, but is in the other(s). Support for actually using this capability in code generation is still forthcoming. This patch is the next logical step for multiple save/restore points support. All points are now stored in DenseMap from MBB to vector of CalleeSavedInfo. Shrink-Wrap points split Part 4. RFC: https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581 Part 1: https://github.com/llvm/llvm-project/pull/117862 (landed) Part 2: https://github.com/llvm/llvm-project/pull/119355 (landed) Part 3: https://github.com/llvm/llvm-project/pull/119357 (landed) Part 5: https://github.com/llvm/llvm-project/pull/119359 (likely to be further split)
7 days	[AMDGPU] Add PAL metadata names for 32 user SGPRs (#160126)	Jay Foad	1	-0/+16
	Since #154205 some subtargets can use up to 32 user SGPRs. Add names for them all so they can be pretty printed in PAL metadata.
7 days	[AMDGPU] Skip debug instructions in SIShrinkInstructions::matchSwap (#160123)	Jay Foad	1	-1/+6

7 days	[NFC][AMDGPU] Streamline code (#160177)	LU-JOHN	1	-18/+6
	Streamline code by only declaring TRI/TII once and using isWave64(). Signed-off-by: John Lu <John.Lu@amd.com>
7 days	[AMDGPU] Use unsigned overflow for S_UADDO_PSEUDO/S_USUBO_PSEUDO (#160142)	LU-JOHN	1	-2/+2
	Use correct unsigned overflow instructions for S_UADDO_PSEUDO/S_USUBO_PSEUDO. Note that this issue was hidden because instruction selection never selected S_UADDO_PSEUDO/S_USUBO_PSEUDO which will be addressed in https://github.com/llvm/llvm-project/pull/159814. Signed-off-by: John Lu <John.Lu@amd.com>
7 days	[AMDGPU] Simplify "class HasMember##member" with llvm::is_detected (NFC) ↵	Kazu Hirata	1	-10/+2
	(#160037) "class HasMember##member" detects a specific member with a complex SFINAE logic involving multiple inheritance. This patch simplifies that by switching to llvm::is_detected.
7 days	[AMDGPU] Skip debug uses in SIInstrInfo::foldImmediate (#160102)	Jay Foad	1	-2/+2

7 days	[AMDGPU] Skip debug uses in SIPeepholeSDWA (#160092)	Jay Foad	1	-1/+2

8 days	[AMDGPU] Simplify template metaprogramming in IsMCExpr##member (NFC) (#160005)	Kazu Hirata	1	-8/+5
	Without this patch, we compute a type trait in a roundabout manner: - Compute a boolean value in the primary template. - Pass the value to std::enable_if_t. - Return std::true_type (or std::false_type on the fallback path). - Compare the return type to std::true_type. That is, when the expression for the first boolean value above is well formed, we already have the answer we are looking for. This patch bypasses the entire sequence by having the primary template return std::bool_constant and adjusting RESULT to extract the ::value of the boolean type.
10 days	[AMDGPU]: Unpack packed instructions overlapped by MFMAs post-RA scheduling ↵	Akash Dutta	3	-5/+398
	(#157968) This is a cleaned up version of PR #151704. These optimizations are now performed post-RA scheduling.
11 days	CodeGen: Add RegisterClass by HwMode (#158269)	Matt Arsenault	3	-4/+9
	This is a generalization of the LookupPtrRegClass mechanism. AMDGPU has several use cases for swapping the register class of instruction operands based on the subtarget, but none of them really fit into the box of being pointer-like. The current system requires manual management of an arbitrary integer ID. For the AMDGPU use case, this would end up being around 40 new entries to manage. This just introduces the base infrastructure. I have ports of all the target specific usage of PointerLikeRegClass ready.
11 days	[SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (#146074)	Fabian Ritter	2	-49/+13
	This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds, that targets can use to allow transformations to introduce out-of-bounds pointer arithmetic. It also moves two such transformations from the AMDGPU-specific DAG combines to the generic DAGCombiner. This is motivated by target features like AArch64's checked pointer arithmetic, CPA, which does not tolerate the introduction of out-of-bounds pointer arithmetic.
11 days	[AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (#145330)	Fabian Ritter	2	-6/+7
	There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125.
11 days	AMDGPU: Remove unnecessary AGPR legalize logic (#159491)	Matt Arsenault	1	-13/+0
	The manual legalizeOperands code only need to consider cases that require full instruction context to know if the operand is legal. This does not need to handle basic operand register class constraints.
11 days	[AMDGPU] gfx1251 VOP3 dpp support (#159654)	Stanislav Mekhanoshin	3	-51/+92

11 days	[AMDGPU] gfx1251 VOP2 dpp support (#159641)	Stanislav Mekhanoshin	1	-34/+45

11 days	[AMDGPU] gfx1251 VOP1 dpp support (#159637)	Stanislav Mekhanoshin	1	-22/+43

11 days	[AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (#143881)	Fabian Ritter	1	-5/+20
	This patch mirrors similar patterns for ISD::ADD. The main difference is that ISD::ADD is commutative, so that a pattern definition for, e.g., (add (mul x, y), z), automatically also handles (add z, (mul x, y)). ISD::PTRADD is not commutative, so we would need to handle these cases explicitly. This patch only implements (ptradd z, (op x, y)) patterns, where the nested operation (shift or multiply) is the offset of the ptradd (i.e., the right operand), since base pointers that are the result of a shift or multiply seem less likely. For SWDEV-516125.
11 days	[AMDGPU][SIInsertWaitcnts] Track SCC. Insert KM_CNT waits for SCC writes. ↵	Petar Avramovic	1	-6/+75
	(#157843) Add new event SCC_WRITE for s_barrier_signal_isfirst and s_barrier_leave, instructions that write to SCC, counter is KM_CNT. Also start tracking SCC for reads and writes. s_barrier_wait on the same barrier guarantees that the SCC write from s_barrier_signal_isfirst has landed, no need to insert s_wait_kmcnt.
12 days	AMDGPU: Remove unnecessary operand legalization for WMMAs (#159370)	Matt Arsenault	1	-15/+0
	The operand constraints already express this constraint, and InstrEmitter will respect them.