Age | Commit message (Collapse) | Author | Files | Lines |
|
Rename canonical register names with WAVE_ prefix for GFX12
Maintain backward compatibility through aliases
|
|
Make sure we cannot be in a mode with both wavesizes. This
prevents assertions in a future change. This should probably
just be an error, but we do not have a good way to report
errors from the MCSubtargetInfo constructor.
|
|
|
|
Since #154205 some subtargets can use up to 32 user SGPRs. Add names for
them all so they can be pretty printed in PAL metadata.
|
|
(#160037)
"class HasMember##member" detects a specific member with a complex
SFINAE logic involving multiple inheritance. This patch simplifies
that by switching to llvm::is_detected.
|
|
Without this patch, we compute a type trait in a roundabout manner:
- Compute a boolean value in the primary template.
- Pass the value to std::enable_if_t.
- Return std::true_type (or std::false_type on the fallback path).
- Compare the return type to std::true_type.
That is, when the expression for the first boolean value above is well
formed, we already have the answer we are looking for.
This patch bypasses the entire sequence by having the primary template
return std::bool_constant and adjusting RESULT to extract the ::value
of the boolean type.
|
|
We have proper encoding facilities to encode operands and instructions;
there's no need to pollute the MC representation with encoding details.
Supposed to be an NFCI, but happens to fix some re-encoded instruction
codes in disassembler tests.
The 64-bit operands are to be addressed in following patches introducing
MC-level representation for lit() and lit64() modifiers, to then be
respected by both the assembler and disassembler.
|
|
|
|
Since many code are connected, this also changes how workgroup id is lowered.
Co-authored-by: Jay Foad <jay.foad@amd.com>
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
|
|
Since the register file was increased that is no longer valid to
call VGPR_32RegClass.getNumregs() to get a total number of arch
registers available on a subtarget.
Fixes: SWDEV-550425
|
|
|
|
|
|
|
|
GFX12+ buffer ops require positive InstOffset per AMD hardware spec.
Modified assembler/disassembler to reject negative buffer offsets.
|
|
Support tail calls to whole wave functions (trivial) and from whole wave
functions (slightly more involved because we need a new pseudo for the
tail call return, that patches up the EXEC mask).
Move the expansion of whole wave function return pseudos (regular and
tail call returns) to prolog epilog insertion, since that's where we
patch up the EXEC mask.
|
|
This is a baseline support, it is not useable yet.
|
|
One bot doesn't like this constexpr after d7484684
|
|
The goal is to expose more variants that can operate without
preconstructed MachineInstrs or MachineOperands.
|
|
|
|
|
|
|
|
PR #149247 made the MD accessible by the backend so we can now leverage
it in the memory model. The first use case here is detecting if a flat op
can access scratch memory.
Benefits both the MemoryLegalizer and InsertWaitCnt.
|
|
|
|
- gfx1250 only supports cu mode
|
|
|
|
|
|
|
|
This is NFCI at this point.
|
|
operand on gfx12+ (#152465)
Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used
as an operand, only one SGPR will be read for both the low and high
operations. As a result, the corresponding bits in `op_sel` and
`op_sel_hi` must be the same when the operand is an SGPR.
Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>
Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>
|
|
Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com>
Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com>
|
|
|
|
Also fixes an assertion on out of bound physical register
indexes.
|
|
|
|
|
|
|
|
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
|
|
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.
Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.
At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.
This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
a SReg_1 representing `%active`, which needs to be passed into
SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
special handling of %active. Since the return may be in a different
basic block, it's difficult to add the virtual reg for %active to
SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
marks any used VGPRs as WWM registers, which are then spilled and
restored with the usual logic.
Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic
and a lot of optimization work (especially in order to reduce spills
around function calls).
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Co-authored-by: Shilei Tian <i@tianshilei.me>
|
|
(#149208)
WMMA XDL instructions are tracked as TRANs ops and the compiler should
consider them the same as TRANS in S_DELAY_ALU insertion. We use a searchable
table for the InsertDelayAlu pass to recognize these WMMA XDL instructions.
Co-authored-by: Stefan Stipanovic <Stefan.Stipanovic@amd.com>
|
|
|
|
|
|
|
|
|
|
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
|
|
Val is already of uint64_t.
|
|
This patch tracks the register operands of both VMEM (FLAT, MUBUF,
MTBUF) and SMEM load-store operations and inserts a S_WAIT_XCNT
instruction with sufficient wait-count before potentially redefining
them. For VMEM instructions, XNACK is returned in the same order as
they were issued and hence non-zero counter values can be inserted.
However, SMEM execution is out-of-order and so is their XNACK reception.
Thus, only zero counter value can be inserted to capture SMEM dependencies.
|
|
Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget
feature, as requested in #130030.
|
|
These get renamed in gfx1250 and on from B64 to I64:
S_CALL_I64
S_GET_PC_I64
S_RFE_I64
S_SET_PC_I64
S_SWAP_PC_I64
|
|
Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead
of keeping two copies for DAG/Global ISel.
Also remove isKernelCC, which doesn't agree with isKernel and doesn't
seem very useful.
While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and
mark them constexpr.
|
|
MCAsmLexer.h has been made a forwarder header since #134207
|
|
|