Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
We have proper encoding facilities to encode operands and instructions;
there's no need to pollute the MC representation with encoding details.
Supposed to be an NFCI, but happens to fix some re-encoded instruction
codes in disassembler tests.
The 64-bit operands are to be addressed in following patches introducing
MC-level representation for lit() and lit64() modifiers, to then be
respected by both the assembler and disassembler.
|
|
Since many code are connected, this also changes how workgroup id is lowered.
Co-authored-by: Jay Foad <jay.foad@amd.com>
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
|
|
Since the register file was increased that is no longer valid to
call VGPR_32RegClass.getNumregs() to get a total number of arch
registers available on a subtarget.
Fixes: SWDEV-550425
|
|
|
|
|
|
GFX12+ buffer ops require positive InstOffset per AMD hardware spec.
Modified assembler/disassembler to reject negative buffer offsets.
|
|
This is a baseline support, it is not useable yet.
|
|
The goal is to expose more variants that can operate without
preconstructed MachineInstrs or MachineOperands.
|
|
|
|
|
|
PR #149247 made the MD accessible by the backend so we can now leverage
it in the memory model. The first use case here is detecting if a flat op
can access scratch memory.
Benefits both the MemoryLegalizer and InsertWaitCnt.
|
|
|
|
- gfx1250 only supports cu mode
|
|
|
|
This is NFCI at this point.
|
|
operand on gfx12+ (#152465)
Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used
as an operand, only one SGPR will be read for both the low and high
operations. As a result, the corresponding bits in `op_sel` and
`op_sel_hi` must be the same when the operand is an SGPR.
Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>
Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>
|
|
|
|
Also fixes an assertion on out of bound physical register
indexes.
|
|
|
|
|
|
|
|
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
|
|
(#149208)
WMMA XDL instructions are tracked as TRANs ops and the compiler should
consider them the same as TRANS in S_DELAY_ALU insertion. We use a searchable
table for the InsertDelayAlu pass to recognize these WMMA XDL instructions.
Co-authored-by: Stefan Stipanovic <Stefan.Stipanovic@amd.com>
|
|
|
|
|
|
|
|
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
|
|
This patch tracks the register operands of both VMEM (FLAT, MUBUF,
MTBUF) and SMEM load-store operations and inserts a S_WAIT_XCNT
instruction with sufficient wait-count before potentially redefining
them. For VMEM instructions, XNACK is returned in the same order as
they were issued and hence non-zero counter values can be inserted.
However, SMEM execution is out-of-order and so is their XNACK reception.
Thus, only zero counter value can be inserted to capture SMEM dependencies.
|
|
Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget
feature, as requested in #130030.
|
|
These get renamed in gfx1250 and on from B64 to I64:
S_CALL_I64
S_GET_PC_I64
S_RFE_I64
S_SET_PC_I64
S_SWAP_PC_I64
|
|
Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead
of keeping two copies for DAG/Global ISel.
Also remove isKernelCC, which doesn't agree with isKernel and doesn't
seem very useful.
While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and
mark them constexpr.
|
|
|
|
scheduling (#125885)" (#139548)
This reapplies 067caaa and 382a085 (reverting b35f6e2) with fixes to
issues detected by the address sanitizer (MIs have to be removed from
live intervals before being removed from their parent MBB).
Original commit description below.
AMDGPU scheduler's `PreRARematStage` attempts to increase function
occupancy w.r.t. ArchVGPR usage by rematerializing trivial
ArchVGPR-defining instruction next to their single use. It first
collects all eligible trivially rematerializable instructions in the
function, then sinks them one-by-one while recomputing occupancy in all
affected regions each time to determine if and when it has managed to
increase overall occupancy. If it does, changes are committed to the
scheduler's state; otherwise modifications to the IR are reverted and
the scheduling stage gives up.
In both cases, this scheduling stage currently involves repeated queries
for up-to-date occupancy estimates and some state copying to enable
reversal of sinking decisions when occupancy is revealed not to
increase. The current implementation also does not accurately track
register pressure changes in all regions affected by sinking decisions.
This commit refactors this scheduling stage, improving RP tracking and
splitting the stage into two distinct steps to avoid repeated occupancy
queries and IR/state rollbacks.
- Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The
number of ArchVGPRs to save to reduce spilling or increase function
occupancy by 1 (when there is no spilling) is computed. Then,
instructions eligible for rematerialization are collected, stopping as
soon as enough have been identified to be able to achieve our goal
(according to slightly optimistic heuristics). If there aren't enough of
such instructions, the scheduling stage stops here.
- Rematerialization (`rematerialize`). Instructions collected in the
first step are rematerialized one-by-one. Now we are able to directly
update the scheduler's state since we have already done the occupancy
analysis and know we won't have to rollback any state. Register
pressures for impacted regions are recomputed only once, as opposed to
at every sinking decision.
In the case where the stage attempted to increase occupancy, and if both
rematerializations alone and rescheduling after were unable to improve
occupancy, then all rematerializations are rollbacked.
|
|
scheduling (#125885)" (#139341)
And related "[AMDGPU] Regenerate mfma-loop.ll test"
Introduce memory error detected by Asan #125885.
This reverts commit 382a085a95b0abeac77b150b7b644b372bd08e78.
This reverts commit 067caaafb58a156d0d77229422607782a639f5b5.
|
|
All immediates are deferred now.
|
|
|
|
(#125885)
AMDGPU scheduler's `PreRARematStage` attempts to increase function
occupancy w.r.t. ArchVGPR usage by rematerializing trivial
ArchVGPR-defining instruction next to their single use. It first
collects all eligible trivially rematerializable instructions in the
function, then sinks them one-by-one while recomputing occupancy in all
affected regions each time to determine if and when it has managed to
increase overall occupancy. If it does, changes are committed to the
scheduler's state; otherwise modifications to the IR are reverted and
the scheduling stage gives up.
In both cases, this scheduling stage currently involves repeated queries
for up-to-date occupancy estimates and some state copying to enable
reversal of sinking decisions when occupancy is revealed not to
increase. The current implementation also does not accurately track
register pressure changes in all regions affected by sinking decisions.
This commit refactors this scheduling stage, improving RP tracking and
splitting the stage into two distinct steps to avoid repeated occupancy
queries and IR/state rollbacks.
- Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The
number of ArchVGPRs to save to reduce spilling or increase function
occupancy by 1 (when there is no spilling) is computed. Then,
instructions eligible for rematerialization are collected, stopping as
soon as enough have been identified to be able to achieve our goal
(according to slightly optimistic heuristics). If there aren't enough of
such instructions, the scheduling stage stops here.
- Rematerialization (`rematerialize`). Instructions collected in the
first step are rematerialized one-by-one. Now we are able to directly
update the scheduler's state since we have already done the occupancy
analysis and know we won't have to rollback any state. Register
pressures for impacted regions are recomputed only once, as opposed to
at every sinking decision.
In the case where the stage attempted to increase occupancy, and if both
rematerializations alone and rescheduling after were unable to improve
occupancy, then all rematerializations are rollbacked.
|
|
add fmac_f16_t16_e64 to isfmac check to fix the vop3 format of
fmac_f16_t16 instruction
|
|
|
|
The new function will return `std::nullopt` when any error occurs.
|
|
This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.
|
|
In dynamic VGPR mode, we can allocate up to 8 blocks of either 16 or 32
VGPRs (based on a chip-wide setting which we can model with a Subtarget
feature). Update some of the subtarget helpers to reflect this.
In particular:
- getVGPRAllocGranule is set to the block size
- getAddresableNumVGPR will limit itself to 8 * size of a block
We also try to be more careful about how many VGPR blocks we allocate.
Therefore, when deciding if we should revert scheduling after a given
stage, we check that we haven't increased the number of VGPR blocks that
need to be allocated.
---------
Co-authored-by: Jannik Silvanus <jannik.silvanus@amd.com>
|
|
From GFX10 onwards it is possible to employ benevolent scheduling of
waves. This patch unconditionally enables, for the `amdhsa` OS, the bit
which controls that capability, as it is beneficial for algorithms that
rely on more complex concurrent coordination and it is generally
performance neutral otherwise.
|
|
This is an extension of #131357. Hopefully this would be the last one.
|
|
|
|
Simplfy `cond ? val : false` to `cond && val` and similar.
|
|
Enable gisel selection for uaddsat and usubsat in true16 flow
This patch includes:
1. Added VGPR_16_Lo128/VGPR_16 to register bank and update register info
for recognizing 16bit regclass id and bit width
2. uaddsat/usubsat test update
|
|
(#127673)
Previous patch is merged
https://github.com/llvm/llvm-project/pull/114500 and it hit a buildbot
failure and thus reverted
It seems the AMDGPU::OpName::OPERAND_LAST is removed at the meantime
when previous patch is merged and that's causing the compile error.
Fixed and reopen it here
|
|
(#114500)"
This reverts commit f7a5f067885b7f6cc4a000c8392adf6b777a9108.
Fails to build with:
llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:126:37: error: no member named 'OPERAND_LAST' in 'llvm::AMDGPU::OpName'
126 | uint16_t OpName = AMDGPU::OpName::OPERAND_LAST;
|