riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
7 days	[AMDGPU] Fix high vgpr printing with true16 (#160209)	Stanislav Mekhanoshin	1	-1/+10

2025-09-16	[AMDGPU][MC] Keep MCOperands unencoded. (#158685)	Ivan Kosarev	1	-0/+28
	We have proper encoding facilities to encode operands and instructions; there's no need to pollute the MC representation with encoding details. Supposed to be an NFCI, but happens to fix some re-encoded instruction codes in disassembler tests. The 64-bit operands are to be addressed in following patches introducing MC-level representation for lit() and lit64() modifiers, to then be respected by both the assembler and disassembler.
2025-09-12	[AMDGPU] Support lowering of cluster related instrinsics (#157978)	Shilei Tian	1	-0/+48
	Since many code are connected, this also changes how workgroup id is lowered. Co-authored-by: Jay Foad <jay.foad@amd.com> Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-09-11	[AMDGPU] Use subtarget call to determine number of VGPRs (#157927)	Stanislav Mekhanoshin	1	-3/+6
	Since the register file was increased that is no longer valid to call VGPR_32RegClass.getNumregs() to get a total number of arch registers available on a subtarget. Fixes: SWDEV-550425
2025-09-05	[AMDGPU] Prevent VOPD combining of VGPRs with different MSBs (#157168)	Stanislav Mekhanoshin	1	-0/+4

2025-09-04	[AMDGPU] High VGPR lowering on gfx1250 (#156965)	Stanislav Mekhanoshin	1	-0/+106

2025-09-04	[AMDGPU] Ensure positive InstOffset for buffer operations (#145504)	Aleksandar Spasojevic	1	-1/+4
	GFX12+ buffer ops require positive InstOffset per AMD hardware spec. Modified assembler/disassembler to reject negative buffer offsets.
2025-09-03	[AMDGPU] Define 1024 VGPRs on gfx1250 (#156765)	Stanislav Mekhanoshin	1	-1/+19
	This is a baseline support, it is not useable yet.
2025-09-03	AMDGPU: Refactor isImmOperandLegal (#155607)	Matt Arsenault	1	-7/+0
	The goal is to expose more variants that can operate without preconstructed MachineInstrs or MachineOperands.
2025-09-02	[AMDGPU] Adjust VGPR allocation encoding on gfx1250 (#156546)	Stanislav Mekhanoshin	1	-0/+3

2025-08-22	[AMDGPU] Common up two local memory size calculations. NFCI. (#154784)	Jay Foad	1	-1/+1

2025-08-19	[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319)	Pierre van Houtryve	1	-0/+24
	PR #149247 made the MD accessible by the backend so we can now leverage it in the memory model. The first use case here is detecting if a flat op can access scratch memory. Benefits both the MemoryLegalizer and InsertWaitCnt.
2025-08-18	[AMDGPU] User SGPR count increased to 32 on gfx1250 (#154205)	Stanislav Mekhanoshin	1	-1/+5

2025-08-14	[AMDGPU] Don't allow wgp mode on gfx1250 (#153680)	Stanislav Mekhanoshin	1	-3/+18
	- gfx1250 only supports cu mode
2025-08-14	[AMDGPU] Increase LDS to 320K on gfx1250 (#153645)	Stanislav Mekhanoshin	1	-2/+4

2025-08-11	[AMDGPU] Per-subtarget DPP instruction classification (#153096)	Stanislav Mekhanoshin	1	-1/+27
	This is NFCI at this point.
2025-08-07	[AMDGPU] Restrict packed math FP32 instructions to read only one SGPR per ↵	Stanislav Mekhanoshin	1	-0/+14
	operand on gfx12+ (#152465) Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used as an operand, only one SGPR will be read for both the low and high operations. As a result, the corresponding bits in `op_sel` and `op_sel_hi` must be the same when the operand is an SGPR. Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com> Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>
2025-08-05	[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203)	Stanislav Mekhanoshin	1	-0/+2

2025-08-01	AMDGPU: Move asm constraint physreg parsing to utils (#150903)	Matt Arsenault	1	-0/+36
	Also fixes an assertion on out of bound physical register indexes.
2025-07-30	[AMDGPU] Add v_cvt_sr\|pk_bf8\|fp8_f16 gfx1250 instructions (#151415)	Stanislav Mekhanoshin	1	-0/+3

2025-07-28	[AMDGPU] MC support for async load and store on gfx1250 (#151030)	Changpeng Fang	1	-1/+8

2025-07-21	[AMDGPU] MC support for gfx1250 scale_offset modifier (#149881)	Stanislav Mekhanoshin	1	-0/+19

2025-07-21	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)	Changpeng Fang	1	-0/+23
	Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-16	AMDGPU: Treat WMMA XDL ops as TRANS in S_DELAY_ALU insertion for gfx1250 ↵	Changpeng Fang	1	-0/+6
	(#149208) WMMA XDL instructions are tracked as TRANs ops and the compiler should consider them the same as TRANS in S_DELAY_ALU insertion. We use a searchable table for the InsertDelayAlu pass to recognize these WMMA XDL instructions. Co-authored-by: Stefan Stipanovic <Stefan.Stipanovic@amd.com>
2025-07-14	[AMDGPU] Add gfx1250 v_fmac_f64 implementation (#148725)	Stanislav Mekhanoshin	1	-0/+1

2025-07-10	[AMDGPU] VOPD/VOPD3 changes for gfx1250 (#147602)	Stanislav Mekhanoshin	1	-24/+129

2025-07-09	[AMDGPU] gfx1250: MC support for 64-bit literals (#147861)	Stanislav Mekhanoshin	1	-1/+1

2025-07-03	AMDGPU: Implement tensor load and store instructions for gfx1250 (#146636)	Changpeng Fang	1	-0/+24
	Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-06-25	[AMDGPU][GFX1250] Insert S_WAIT_XCNT for SMEM and VMEM load-stores (#145566)	Christudasan Devadasan	1	-0/+9
	This patch tracks the register operands of both VMEM (FLAT, MUBUF, MTBUF) and SMEM load-store operations and inserts a S_WAIT_XCNT instruction with sufficient wait-count before potentially redefining them. For VMEM instructions, XNACK is returned in the same order as they were issued and hence non-zero counter values can be inserted. However, SMEM execution is out-of-order and so is their XNACK reception. Thus, only zero counter value can be inserted to capture SMEM dependencies.
2025-06-24	[AMDGPU] Replace dynamic VGPR feature with attribute (#133444)	Diana Picus	1	-16/+46
	Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in #130030.
2025-06-21	[AMDGPU] Rename call instructions from b64 to i64 (#145103)	Stanislav Mekhanoshin	1	-0/+4
	These get renamed in gfx1250 and on from B64 to I64: S_CALL_I64 S_GET_PC_I64 S_RFE_I64 S_SET_PC_I64 S_SWAP_PC_I64
2025-06-05	[AMDGPU] Remove duplicated/confusing helpers. NFCI (#142598)	Diana Picus	1	-65/+0
	Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead of keeping two copies for DAG/Global ISel. Also remove isKernelCC, which doesn't agree with isKernel and doesn't seem very useful. While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and mark them constexpr.
2025-05-14	[AMDGPU] Use std::optional::value_or (NFC) (#140006)	Kazu Hirata	1	-1/+1

2025-05-13	Reapply "[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during ↵	Lucas Ramirez	1	-0/+2
	scheduling (#125885)" (#139548) This reapplies 067caaa and 382a085 (reverting b35f6e2) with fixes to issues detected by the address sanitizer (MIs have to be removed from live intervals before being removed from their parent MBB). Original commit description below. AMDGPU scheduler's `PreRARematStage` attempts to increase function occupancy w.r.t. ArchVGPR usage by rematerializing trivial ArchVGPR-defining instruction next to their single use. It first collects all eligible trivially rematerializable instructions in the function, then sinks them one-by-one while recomputing occupancy in all affected regions each time to determine if and when it has managed to increase overall occupancy. If it does, changes are committed to the scheduler's state; otherwise modifications to the IR are reverted and the scheduling stage gives up. In both cases, this scheduling stage currently involves repeated queries for up-to-date occupancy estimates and some state copying to enable reversal of sinking decisions when occupancy is revealed not to increase. The current implementation also does not accurately track register pressure changes in all regions affected by sinking decisions. This commit refactors this scheduling stage, improving RP tracking and splitting the stage into two distinct steps to avoid repeated occupancy queries and IR/state rollbacks. - Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The number of ArchVGPRs to save to reduce spilling or increase function occupancy by 1 (when there is no spilling) is computed. Then, instructions eligible for rematerialization are collected, stopping as soon as enough have been identified to be able to achieve our goal (according to slightly optimistic heuristics). If there aren't enough of such instructions, the scheduling stage stops here. - Rematerialization (`rematerialize`). Instructions collected in the first step are rematerialized one-by-one. Now we are able to directly update the scheduler's state since we have already done the occupancy analysis and know we won't have to rollback any state. Register pressures for impacted regions are recomputed only once, as opposed to at every sinking decision. In the case where the stage attempted to increase occupancy, and if both rematerializations alone and rescheduling after were unable to improve occupancy, then all rematerializations are rollbacked.
2025-05-09	Revert "[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during ↵	Vitaly Buka	1	-2/+0
	scheduling (#125885)" (#139341) And related "[AMDGPU] Regenerate mfma-loop.ll test" Introduce memory error detected by Asan #125885. This reverts commit 382a085a95b0abeac77b150b7b644b372bd08e78. This reverts commit 067caaafb58a156d0d77229422607782a639f5b5.
2025-05-09	[AMDGPU][NFC] Remove _DEFERRED operands. (#139123)	Ivan Kosarev	1	-2/+0
	All immediates are deferred now.
2025-05-08	[AMDGPU][NFC] Remove unused operand types. (#139062)	Ivan Kosarev	1	-6/+0

2025-05-08	[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling ↵	Lucas Ramirez	1	-0/+2
	(#125885) AMDGPU scheduler's `PreRARematStage` attempts to increase function occupancy w.r.t. ArchVGPR usage by rematerializing trivial ArchVGPR-defining instruction next to their single use. It first collects all eligible trivially rematerializable instructions in the function, then sinks them one-by-one while recomputing occupancy in all affected regions each time to determine if and when it has managed to increase overall occupancy. If it does, changes are committed to the scheduler's state; otherwise modifications to the IR are reverted and the scheduling stage gives up. In both cases, this scheduling stage currently involves repeated queries for up-to-date occupancy estimates and some state copying to enable reversal of sinking decisions when occupancy is revealed not to increase. The current implementation also does not accurately track register pressure changes in all regions affected by sinking decisions. This commit refactors this scheduling stage, improving RP tracking and splitting the stage into two distinct steps to avoid repeated occupancy queries and IR/state rollbacks. - Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The number of ArchVGPRs to save to reduce spilling or increase function occupancy by 1 (when there is no spilling) is computed. Then, instructions eligible for rematerialization are collected, stopping as soon as enough have been identified to be able to achieve our goal (according to slightly optimistic heuristics). If there aren't enough of such instructions, the scheduling stage stops here. - Rematerialization (`rematerialize`). Instructions collected in the first step are rematerialized one-by-one. Now we are able to directly update the scheduler's state since we have already done the occupancy analysis and know we won't have to rollback any state. Register pressures for impacted regions are recomputed only once, as opposed to at every sinking decision. In the case where the stage attempted to increase occupancy, and if both rematerializations alone and rescheduling after were unable to improve occupancy, then all rematerializations are rollbacked.
2025-04-13	[AMDGPU][True16][MC] fix fmac_f16_t16 vop3 format (#135464)	Brox Chen	1	-0/+2
	add fmac_f16_t16_e64 to isfmac check to fix the vop3 format of fmac_f16_t16 instruction
2025-03-29	[NFC][AMDGPU] clang-format `AMDGPUBaseInfo.[h,cpp]` (#133559)	Shilei Tian	1	-135/+137

2025-03-27	[AMDGPU] Add a new function `getIntegerPairAttribute` (#133271)	Shilei Tian	1	-8/+17
	The new function will return `std::nullopt` when any error occurs.
2025-03-21	Reapply "[AMDGPU] Use COV6 by default (#118515)" (#130963)	Shilei Tian	1	-1/+1
	This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.
2025-03-19	[AMDGPU] Update target helpers & GCNSchedStrategy for dynamic VGPRs (#130047)	Diana Picus	1	-0/+6
	In dynamic VGPR mode, we can allocate up to 8 blocks of either 16 or 32 VGPRs (based on a chip-wide setting which we can model with a Subtarget feature). Update some of the subtarget helpers to reflect this. In particular: - getVGPRAllocGranule is set to the block size - getAddresableNumVGPR will limit itself to 8 * size of a block We also try to be more careful about how many VGPR blocks we allocate. Therefore, when deciding if we should revert scheduling after a given stage, we check that we haven't increased the number of VGPR blocks that need to be allocated. --------- Co-authored-by: Jannik Silvanus <jannik.silvanus@amd.com>
2025-03-17	[llvm][AMDGPU] Enable FWD_PROGRESS bit for GFX10+ (#128367)	Alex Voicu	1	-1/+1
	From GFX10 onwards it is possible to employ benevolent scheduling of waves. This patch unconditionally enables, for the `amdhsa` OS, the bit which controls that capability, as it is beneficial for algorithms that rely on more complex concurrent coordination and it is generally performance neutral otherwise.
2025-03-14	[NFC][AMDGPU] Replace more direct arch comparison with isAMDGCN() (#131379)	Shilei Tian	1	-1/+1
	This is an extension of #131357. Hopefully this would be the last one.
2025-03-12	[AMDGPU] Merge consecutive wait_alu instruction (#128916)	Ana Mihajlovic	1	-0/+36

2025-03-03	[AMDGPU] Simplify conditional expressions. NFC. (#129228)	Jay Foad	1	-15/+15
	Simplfy `cond ? val : false` to `cond && val` and similar.
2025-02-25	[AMDGPU][True16][CodeGen] uaddsat/usubsat true16 selection in gisel (#128233)	Brox Chen	1	-0/+2
	Enable gisel selection for uaddsat and usubsat in true16 flow This patch includes: 1. Added VGPR_16_Lo128/VGPR_16 to register bank and update register info for recognizing 16bit regclass id and bit width 2. uaddsat/usubsat test update
2025-02-18	[AMDGPU][True16][CodeGen] reopen "FLAT_load using D16 pseudo instruction" ↵	Brox Chen	1	-0/+1
	(#127673) Previous patch is merged https://github.com/llvm/llvm-project/pull/114500 and it hit a buildbot failure and thus reverted It seems the AMDGPU::OpName::OPERAND_LAST is removed at the meantime when previous patch is merged and that's causing the compile error. Fixed and reopen it here
2025-02-18	Revert "[AMDGPU][True16][CodeGen] FLAT_load using D16 pseudo instruction ↵	Nikita Popov	1	-1/+0
	(#114500)" This reverts commit f7a5f067885b7f6cc4a000c8392adf6b777a9108. Fails to build with: llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:126:37: error: no member named 'OPERAND_LAST' in 'llvm::AMDGPU::OpName' 126 \| uint16_t OpName = AMDGPU::OpName::OPERAND_LAST;