aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
AgeCommit message (Collapse)AuthorFilesLines
2024-02-23[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)Ivan Kosarev1-5/+40
These are hoped to provide more convenient and less error prone facilities to encode and decode fields than manually defined constants and functions.
2024-02-22[AMDGPU][NFC] Refactor SIInsertWaitcnts zero waitcnt generation (#82575)vangthao951-9/+0
Move the allZero* waitcnt generation methods into WaitcntGenerator class.
2024-02-16[AMDGPU] Use `bf16` instead of `i16` for bfloat (#80908)Shilei Tian1-0/+16
Currently we generally use `i16` to represent `bf16` in those tablegen files. This patch is trying to use `bf16` directly. Fix #79369.
2024-02-12[AMDGPU][MC] Fix printing vcc(_lo) twice for VOPC DPP instrucitons (#81158)Mirko Brkušanin1-0/+3
2024-02-12[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)Pierre van Houtryve1-0/+11
These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.
2024-02-07Revert "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)"Carl Ritson1-5/+0
This reverts commit d6c7253d32e4bdff619c39708170f1c1fa01ff95. Change causing CTS failures due to incomplete metadata.
2024-02-06[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)David Stuttard1-0/+5
PAL Metadata 3.0 introduces an explicit structure in metadata for the programmable registers written out by the compiler backend. The previous approach used opaque registers which can change between different architectures and required encoding the bitfield information in the backend, which may change between versions. This change is an extension the previously added support - which only handled entry functions. This adds support for all functions. The change also includes some re-factoring to separate common code.
2024-02-05[AMDGPU] Introduce Code Object V6 (#76954)Pierre van Houtryve1-1/+1
Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch #76955
2024-02-01[llvm-objdump][AMDGPU] Pass ELF ABIVersion through disassembler (#78907)Emma Pilkington1-1/+4
Admittedly, its a bit ugly to pass the ABIVersion through onSymbolStart but I'm not sure what a better place for it would be.
2024-01-24[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… ↵Mariusz Sikora1-0/+3
(#78414) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-21[AMDGPU] Add an asm directive to track code_object_version (#76267)Emma Pilkington1-19/+12
Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.
2024-01-18[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)Jay Foad1-25/+98
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.
2024-01-04AMDGPU: Fix packed 16-bit inline constants (#76522)Nicolai Hähnle1-4/+7
Consistently treat packed 16-bit operands as 32-bit values, because that's really what they are. The attempt to treat them differently was ultimately incorrect and lead to miscompiles, e.g. when using non-splat constants such as (1, 0) as operands. Recognize 32-bit float constants for i/u16 instructions. This is a bit odd conceptually, but it matches HW behavior and SP3. Remove isFoldableLiteralV216; there was too much magic in the dependency between it and its use in SIFoldOperands. Instead, we now simply rely on checking whether a constant is an inline constant, and trying a bunch of permutations of the low and high halves. This is more obviously correct and leads to some new cases where inline constants are used as shown by tests. Move the logic for switching packed add vs. sub into SIFoldOperands. This has two benefits: all logic that optimizes for inline constants in packed math is now in one place; and it applies to both SelectionDAG and GISel paths. Disable the use of opsel with v_dot* instructions on gfx11. They are documented to ignore opsel on src0 and src1. It may be interesting to re-enable to use of opsel on src2 as a future optimization. A similar "proper" fix of what inline constants mean could potentially be applied to unpacked 16-bit ops. However, it's less clear what the benefit would be, and there are surely places where we'd have to carefully audit whether values are properly sign- or zero-extended. It is best to keep such a change separate. Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
2023-12-14[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193)Mirko Brkušanin1-1/+1
2023-12-13[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)Mariusz Sikora1-0/+1
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-12[AMDGPU] Update VOP instructions for GFX12 (#74853)Mariusz Sikora1-5/+14
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-04[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062)Mirko Brkušanin1-1/+1
2023-11-28[AMDGPU] Fix folding of v2i16/v2f16 splat imms (#72709)Stanislav Mekhanoshin1-0/+3
We can use inline constants with packed 16-bit operands, but these should use op_sel. Currently splat of inlinable constants is considered legal, which is not really true if we fail to fold it with op_sel and drop the high half. It may be legal as a literal but not as inline constant, but then usual literal checks must be performed. This patch makes these splat literals illegal but adds additional logic to the operand folding to keep current folds. This logic is somewhat heavy though. This has fixed constant bus violation in the fdot2 test.
2023-11-23[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)Jay Foad1-0/+6
Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.
2023-11-07Reland: [AMDGPU] Remove Code Object V3 (#67118)Pierre van Houtryve1-8/+1
V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.
2023-10-18Revert "[AMDGPU] Remove Code Object V3 (#67118)"pvanhout1-1/+8
This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.
2023-10-16[AMDGPU] Remove Code Object V3 (#67118)Pierre van Houtryve1-8/+1
V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.
2023-10-12[AMDGPU] Change the representation of double literals in operands (#68740)Stanislav Mekhanoshin1-0/+3
A 64-bit literal can be used as a 32-bit zero or sign extended operand. In case of double zeroes are added to the low 32 bits. Currently asm parser stores only high 32 bits of a double into an operand. To support codegen as requested by the https://github.com/llvm/llvm-project/issues/67781 we need to change the representation to store a full 64-bit value so that codegen can simply add immediates to an instruction. There is some code to support compatibility with existing tests and asm kernels. We allow to use short hex strings to represent only a high 32 bit of a double value as a valid literal.
2023-09-29[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on supported subtargets ↵Mirko Brkušanin1-0/+1
(#67461) In order to avoid duplicating every dpp pseudo opcode that has src1, we allow it for all opcodes and add manual checks on subtargets that do not support it.
2023-09-25[AMDGPU][NFC] Add True16 operand definitions.Ivan Kosarev1-0/+4
Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D156103
2023-09-21[AMDGPU] Remove Code Object V2 (#65715)Pierre van Houtryve1-7/+2
Code Object V2 has been deprecated for more than a year now. We can safely remove it from LLVM. - [clang] Remove support for the `-mcode-object-version=2` option. - [lld] Remove/refactor tests that were still using COV2 - [llvm] Update AMDGPUUsage.rst - Code Object V2 docs are left for informational purposes because those code objects may still be supported by the runtime/loaders for a while. - [AMDGPU] Remove COV2 emission capabilities. - [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2 - [AMDGPU] Update all tests that were still using COV2 - They are either deleted or ported directly to code object v4 (as v3 is also planned to be removed soon).
2023-09-19[AMDGPU] Add ASM and MC updates for preloading kernargsAustin Kerbow1-0/+2
Add assembler directives for preloading kernel arguments that correspond to new fields in the kernel descriptor for the length and offset of arguments that will be placed in SGPRs prior to kernel launch. Alignment of the arguments in SGPRs is equivalent to the kernarg segment when accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated directly after other user SGPRs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159459
2023-08-22[AMDGPU] Rename 64BitDPP feature and fix the checksStanislav Mekhanoshin1-1/+7
Names '64BitDPP' and especially 'DPP64' were found misleading, and DPP64 can easily be mixed with DPP16 and DPP8 while these are different concepts. DPP16 and DPP8 refers to lanes where DPP64 refers to the operand size. In fact the essential part here is that these instructions are executed on the DP ALU, so rename the feature accordingly. I have also found a bug in a check for these instructions, which is fixed here and a common utility function is now used. Differential Revision: https://reviews.llvm.org/D158465
2023-08-21[AMDGPU] ISel for amdgpu_cs_chain[_preserve] functionsDiana Picus1-0/+3
Lower formal arguments and returns for functions with the `amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions: * Put `inreg` arguments into SGPRs, starting at s0, and other arguments into VGPRs, starting at v8. No arguments should end up on the stack, if we don't have enough registers we should error out. * Lower the return (which is always void) as an S_ENDPGM. * Set the ScratchRSrc register to s48:51, as described in the docs. * Set the SP to s32, matching amdgpu_gfx. This might be revisited in a future patch. Differential Revision: https://reviews.llvm.org/D153517
2023-08-18AMDGPU/Uniformity/GlobalISel: G_AMDGPU atomics are always divergentMirko Brkusanin1-0/+3
Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D157091
2023-08-07[AMDGPU] Add and use SIInstrFlags::GWS. NFC.Jay Foad1-0/+1
This reduces the number of places where we have to check for a list of DS_GWS_* opcodes. Differential Revision: https://reviews.llvm.org/D157099
2023-07-27[AMDGPU] Avoid CodeGen dependencies from AMDGPU/Utils and MCTargetDescReid Kleckner1-4/+0
This required two substantial changes: 1. Moving a `getRegBitWidth(TargetRegisterClass)` overload out of Utils and into CodeGen 2. Passing the string function name to AMDGPUPALMetadata instead of the MachineFunction Other changes are minor or updates to accommodate the first two. See issue #64166 for more information on the layering issue. Differential Revision: https://reviews.llvm.org/D156486
2023-07-21[AMDGPU] Remove std::optional from VOPD::ComponentProps. NFC.Stanislav Mekhanoshin1-5/+4
This class has to be fast and efficient with a trivial copy constructor. Differential Revision: https://reviews.llvm.org/D155881
2023-07-15[amdgpu][nfc] Use unsigned for getIntegerPairAttribute to match the only ↵Jon Chesterfield1-4/+4
call sites
2023-07-13[AMDGPU][MC] Fix handling of A16 operands in intersect_ray instructions.Ivan Kosarev1-0/+1
The patch adds the support for 'noa16' operands in non-A16 variants of the instructions, fixes validation of A16 operands and eliminates the custom conversion to MCInst. Part of <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155057
2023-07-04[AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operandsStephen Thomas1-0/+27
Add functions AMDGPU::DepCtr::encodeField*() and AMDGPU::DepCtr::decodeField*() for each of vm_vsrc, va_vdst and sa_sdst. These are now used in AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with S_WAITCNT_DEPCTR operands easier and more readable. Differential Revision: https://reviews.llvm.org/D154424
2023-06-20[AMDGPU] Remove unused method Waitcnt::dominates(). NFCStephen Thomas1-5/+0
Differential Revision: https://reviews.llvm.org/D153322
2023-06-12[AMDGPU] Remove integer division in VOPD checksStanislav Mekhanoshin1-2/+3
There is no way any compiler can simplify this division, while the check is done rather often. Differential Revision: https://reviews.llvm.org/D152613
2023-06-07[AMDGPU][NFC] Add a getRegBitWidth() helper for TargetRegisterClass operands.Ivan Kosarev1-0/+4
Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152257
2023-02-23[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC.Jay Foad1-105/+0
This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget. Differential Revision: https://reviews.llvm.org/D144650
2023-02-23[AMDGPU][MC][GFX11] Add Partial NSA format for image sample instructionsMirko Brkusanin1-0/+1
Image sample instructions that need more than 5 VGPRs for VAddr can use partial NSA for NSA encoding format. VGPRs that can not fit into the encoding are sequential after the last one. This patch adds assembly and disassembly parts. Differential Revision: https://reviews.llvm.org/D144033
2023-02-22[AMDGPU] Move splitMUBUFOffset from AMDGPUBaseInfo to SIInstrInfoJay Foad1-5/+0
Moving this out of AMDGPUBaseInfo enforces that AMDGPUBaseInfo should not be calling into GCNSubtarget. Differential Revision: https://reviews.llvm.org/D144564
2023-02-13Reapply "[AMDGPU] Modify adjustInliningThreshold to also consider the cost ↵Janek van Oirschot1-0/+3
of passing function arguments through the stack" Reapplies 142c28ffa1323e9a8d53200a22c80d5d778e0d0f as part of D140242 which got reverted due to amdgpu openmp test failures. This diff fixes said failures by eliding most of `adjustInliningThresholdUsingCallee` for indirect calls as the callee function is unavailable for indirect calls. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D143498
2023-02-10AMDGPU: Use module flag to get code object version at IR level folow-upChangpeng Fang1-0/+12
Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293
2023-02-02AMDGPU: Use module flag to get code object version at IR levelChangpeng Fang1-4/+8
Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313
2023-01-31[AMDGPU] Use tablegen to list uniform intrinsicsYashwant Singh1-0/+3
Right now we do opcode wise matching to identify uniform/non-divergent AMDGPU intrinsics. It is duplicated at 2 places once at IR level uniformity analysis and at MIR level. Moving them to single tablegen table for consistency and adding and API rapper to access them. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D142961
2023-01-23AMDGPU: Clean up LDS-related occupancy calculationsNicolai Hähnle1-0/+4
Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468
2023-01-23[MC] Make more use of MCInstrDesc::operands. NFC.Jay Foad1-1/+1
Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213
2023-01-12Partially reapply "AMDGPU: Invert handling of enqueued block detection"Matt Arsenault1-0/+3
This mostly reverts commit 270e96f435596449002fc89962595497481c8770. Keep the attributor related changes around, but functionally restore the old behavior as a workaround. Device enqueue goes back to not working at -O0 with this version.
2023-01-12[AMDGPU] Simplify getNumFlatOffsetBits. NFC.Jay Foad1-3/+4
Previously we considered this field to be either N-bit unsigned or N+1-bit signed, depending on the instruction. I think it's conceptually simpler to say that the field is always N+1-bit signed, but some instructions do not allow negative values. Differential Revision: https://reviews.llvm.org/D140883