aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/Utils
AgeCommit message (Collapse)AuthorFilesLines
5 days[AMDGPU] Add GFX12 wave register names with WAVE_ prefix (#144352)Aleksandar Spasojevic1-87/+79
Rename canonical register names with WAVE_ prefix for GFX12 Maintain backward compatibility through aliases
5 daysAMDGPU: Ensure both wavesize features are not set (#159234)Matt Arsenault1-0/+5
Make sure we cannot be in a mode with both wavesizes. This prevents assertions in a future change. This should probably just be an error, but we do not have a good way to report errors from the MCSubtargetInfo constructor.
7 days[AMDGPU] Fix high vgpr printing with true16 (#160209)Stanislav Mekhanoshin1-1/+10
7 days[AMDGPU] Add PAL metadata names for 32 user SGPRs (#160126)Jay Foad1-0/+16
Since #154205 some subtargets can use up to 32 user SGPRs. Add names for them all so they can be pretty printed in PAL metadata.
8 days[AMDGPU] Simplify "class HasMember##member" with llvm::is_detected (NFC) ↵Kazu Hirata1-10/+2
(#160037) "class HasMember##member" detects a specific member with a complex SFINAE logic involving multiple inheritance. This patch simplifies that by switching to llvm::is_detected.
8 days[AMDGPU] Simplify template metaprogramming in IsMCExpr##member (NFC) (#160005)Kazu Hirata1-8/+5
Without this patch, we compute a type trait in a roundabout manner: - Compute a boolean value in the primary template. - Pass the value to std::enable_if_t. - Return std::true_type (or std::false_type on the fallback path). - Compare the return type to std::true_type. That is, when the expression for the first boolean value above is well formed, we already have the answer we are looking for. This patch bypasses the entire sequence by having the primary template return std::bool_constant and adjusting RESULT to extract the ::value of the boolean type.
2025-09-16[AMDGPU][MC] Keep MCOperands unencoded. (#158685)Ivan Kosarev2-0/+31
We have proper encoding facilities to encode operands and instructions; there's no need to pollute the MC representation with encoding details. Supposed to be an NFCI, but happens to fix some re-encoded instruction codes in disassembler tests. The 64-bit operands are to be addressed in following patches introducing MC-level representation for lit() and lit64() modifiers, to then be respected by both the assembler and disassembler.
2025-09-15[AMDGPU][Attributor] Add `AAAMDGPUClusterDims` (#158076)Shilei Tian1-1/+1
2025-09-12[AMDGPU] Support lowering of cluster related instrinsics (#157978)Shilei Tian2-0/+92
Since many code are connected, this also changes how workgroup id is lowered. Co-authored-by: Jay Foad <jay.foad@amd.com> Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-09-11[AMDGPU] Use subtarget call to determine number of VGPRs (#157927)Stanislav Mekhanoshin1-3/+6
Since the register file was increased that is no longer valid to call VGPR_32RegClass.getNumregs() to get a total number of arch registers available on a subtarget. Fixes: SWDEV-550425
2025-09-08[AMDGPU] Add MSG_RTN_GET_CLUSTER_BARRIER_STATE (#157549)Stanislav Mekhanoshin1-0/+2
2025-09-05[AMDGPU] Prevent VOPD combining of VGPRs with different MSBs (#157168)Stanislav Mekhanoshin1-0/+4
2025-09-04[AMDGPU] High VGPR lowering on gfx1250 (#156965)Stanislav Mekhanoshin2-0/+125
2025-09-04[AMDGPU] Ensure positive InstOffset for buffer operations (#145504)Aleksandar Spasojevic1-1/+4
GFX12+ buffer ops require positive InstOffset per AMD hardware spec. Modified assembler/disassembler to reject negative buffer offsets.
2025-09-04[AMDGPU] Tail call support for whole wave functions (#145860)Diana Picus1-0/+1
Support tail calls to whole wave functions (trivial) and from whole wave functions (slightly more involved because we need a new pseudo for the tail call return, that patches up the EXEC mask). Move the expansion of whole wave function return pseudos (regular and tail call returns) to prolog epilog insertion, since that's where we patch up the EXEC mask.
2025-09-03[AMDGPU] Define 1024 VGPRs on gfx1250 (#156765)Stanislav Mekhanoshin1-1/+19
This is a baseline support, it is not useable yet.
2025-09-03AMDGPU: Replace constexpr with inlineMatt Arsenault1-1/+1
One bot doesn't like this constexpr after d7484684
2025-09-03AMDGPU: Refactor isImmOperandLegal (#155607)Matt Arsenault2-8/+8
The goal is to expose more variants that can operate without preconstructed MachineInstrs or MachineOperands.
2025-09-02[AMDGPU] Adjust VGPR allocation encoding on gfx1250 (#156546)Stanislav Mekhanoshin1-0/+3
2025-09-02[AMDGPU] Fix hw stage metadata setting for unsigned values (#154502)Ana Mihajlovic2-0/+12
2025-08-22[AMDGPU] Common up two local memory size calculations. NFCI. (#154784)Jay Foad1-1/+1
2025-08-19[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319)Pierre van Houtryve2-0/+28
PR #149247 made the MD accessible by the backend so we can now leverage it in the memory model. The first use case here is detecting if a flat op can access scratch memory. Benefits both the MemoryLegalizer and InsertWaitCnt.
2025-08-18[AMDGPU] User SGPR count increased to 32 on gfx1250 (#154205)Stanislav Mekhanoshin1-1/+5
2025-08-14[AMDGPU] Don't allow wgp mode on gfx1250 (#153680)Stanislav Mekhanoshin2-3/+19
- gfx1250 only supports cu mode
2025-08-14[AMDGPU] Increase LDS to 320K on gfx1250 (#153645)Stanislav Mekhanoshin1-2/+4
2025-08-13[AMDGPU] Add MSG_SAVEWAVE_HAS_TDM on gfx1250 (#153483)Stanislav Mekhanoshin1-1/+1
2025-08-13[AMDGPU] Add HW_REG_IB_STS2 on gfx1250 (#153479)Stanislav Mekhanoshin1-1/+1
2025-08-11[AMDGPU] Per-subtarget DPP instruction classification (#153096)Stanislav Mekhanoshin2-4/+37
This is NFCI at this point.
2025-08-07[AMDGPU] Restrict packed math FP32 instructions to read only one SGPR per ↵Stanislav Mekhanoshin2-0/+16
operand on gfx12+ (#152465) Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used as an operand, only one SGPR will be read for both the low and high operations. As a result, the corresponding bits in `op_sel` and `op_sel_hi` must be the same when the operand is an SGPR. Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com> Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>
2025-08-06[AMDGPU] Add XNACK_STATE_PRIV and _MASK gfx1250 registers (#152374)Stanislav Mekhanoshin1-0/+4
Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com> Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com>
2025-08-05[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203)Stanislav Mekhanoshin1-0/+2
2025-08-01AMDGPU: Move asm constraint physreg parsing to utils (#150903)Matt Arsenault2-0/+42
Also fixes an assertion on out of bound physical register indexes.
2025-07-30[AMDGPU] Add v_cvt_sr|pk_bf8|fp8_f16 gfx1250 instructions (#151415)Stanislav Mekhanoshin2-0/+4
2025-07-28[AMDGPU] MC support for async load and store on gfx1250 (#151030)Changpeng Fang1-1/+8
2025-07-21[AMDGPU] MC support for gfx1250 scale_offset modifier (#149881)Stanislav Mekhanoshin2-0/+22
2025-07-21AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)Changpeng Fang2-0/+31
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21[AMDGPU] ISel & PEI for whole wave functions (#145858)Diana Picus2-1/+3
Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-07-16AMDGPU: Treat WMMA XDL ops as TRANS in S_DELAY_ALU insertion for gfx1250 ↵Changpeng Fang2-0/+15
(#149208) WMMA XDL instructions are tracked as TRANs ops and the compiler should consider them the same as TRANS in S_DELAY_ALU insertion. We use a searchable table for the InsertDelayAlu pass to recognize these WMMA XDL instructions. Co-authored-by: Stefan Stipanovic <Stefan.Stipanovic@amd.com>
2025-07-14[AMDGPU] Add gfx1250 v_fmac_f64 implementation (#148725)Stanislav Mekhanoshin1-0/+1
2025-07-11[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282)Stanislav Mekhanoshin1-0/+1
2025-07-10[AMDGPU] VOPD/VOPD3 changes for gfx1250 (#147602)Stanislav Mekhanoshin2-44/+215
2025-07-09[AMDGPU] gfx1250: MC support for 64-bit literals (#147861)Stanislav Mekhanoshin1-1/+1
2025-07-03AMDGPU: Implement tensor load and store instructions for gfx1250 (#146636)Changpeng Fang2-0/+31
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-01[AMDGPU] Remove an unnecessary cast (NFC) (#146548)Kazu Hirata1-1/+1
Val is already of uint64_t.
2025-06-25[AMDGPU][GFX1250] Insert S_WAIT_XCNT for SMEM and VMEM load-stores (#145566)Christudasan Devadasan2-4/+18
This patch tracks the register operands of both VMEM (FLAT, MUBUF, MTBUF) and SMEM load-store operations and inserts a S_WAIT_XCNT instruction with sufficient wait-count before potentially redefining them. For VMEM instructions, XNACK is returned in the same order as they were issued and hence non-zero counter values can be inserted. However, SMEM execution is out-of-order and so is their XNACK reception. Thus, only zero counter value can be inserted to capture SMEM dependencies.
2025-06-24[AMDGPU] Replace dynamic VGPR feature with attribute (#133444)Diana Picus2-21/+62
Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in #130030.
2025-06-21[AMDGPU] Rename call instructions from b64 to i64 (#145103)Stanislav Mekhanoshin2-0/+5
These get renamed in gfx1250 and on from B64 to I64: S_CALL_I64 S_GET_PC_I64 S_RFE_I64 S_SET_PC_I64 S_SWAP_PC_I64
2025-06-05[AMDGPU] Remove duplicated/confusing helpers. NFCI (#142598)Diana Picus2-77/+75
Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead of keeping two copies for DAG/Global ISel. Also remove isKernelCC, which doesn't agree with isKernel and doesn't seem very useful. While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and mark them constexpr.
2025-05-25Replace #include MCAsmLexer.h with AsmLexer.hFangrui Song1-1/+1
MCAsmLexer.h has been made a forwarder header since #134207
2025-05-14[AMDGPU] Use std::optional::value_or (NFC) (#140006)Kazu Hirata1-1/+1