riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
6 days	[AMDGPU][AsmParser] Introduce MC representation for lit() and lit64(). (#160316)	Ivan Kosarev	1	-6/+9
	And rework the lit64() support to use it. The rules for when to add lit64() can be simplified and improved. In this change, however, we just follow the existing conventions on the assembler and disassembler sides. In codegen we do not (and normally should not need to) add explicit lit() and lit64() modifiers, so the codegen tests lose them. The change is an NFCI otherwise. Simplifies printing operands.
2025-09-04	[AMDGPU] Ensure positive InstOffset for buffer operations (#145504)	Aleksandar Spasojevic	1	-0/+3
	GFX12+ buffer ops require positive InstOffset per AMD hardware spec. Modified assembler/disassembler to reject negative buffer offsets.
2025-09-01	[LLVM][MC][DecoderEmitter] Add support to specialize decoder per bitwidth ↵	Rahul Joshi	1	-38/+0
	(#154865) This change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code. Current state: Currently, the code generated by the decoder emitter consists of two key functions: `decodeInstruction` which is the entry point into the generated code and `decodeToMCInst` which is invoked when a decode op is reached while traversing through the decoder table. Both functions are templated on `InsnType` which is the raw instruction bits that are supplied to `decodeInstruction`. Several backends call `decodeInstruction` with different `InsnType` types, leading to several template instantiations of these functions in the final code. As an example, AMDGPU instantiates this function with type `DecoderUInt128` type for decoding 96/128-bit instructions, `uint64_t` for decoding 64-bit instructions, and `uint32_t` for decoding 32-bit instructions. Since there is just one `decodeToMCInst` in the generated code, it has code that handles decoding for all instruction sizes. However, the decoders emitted for different instructions sizes rarely have any intersection with each other. That means, in the AMDGPU case, the instantiation with InsnType == DecoderUInt128 has decoder code for 32/64-bit instructions that is never exercised. Conversely, the instantiation with InsnType == uint64_t has decoder code for 128/96/32-bit instructions that is never exercised. This leads to unnecessary dead code in the generated disassembler binary (that the compiler cannot eliminate by itself). New state: With this change, we introduce an option `specialize-decoders-per-bitwidth`. Under this mode, the DecoderEmitter will generate several versions of `decodeToMCInst` function, one for each bitwidth. The code is still templated, but will require backends to specify, for each `InsnType` used, the bitwidth of the instruction that the type is used to represent using a type-trait `InsnBitWidth`. This will enable the templated code to choose the right variant of `decodeToMCInst`. Under this mode, a particular instantiation will only end up instantiating a single variant of `decodeToMCInst` generated and that will include only those decoders that are applicable to a single bitwidth, resulting in elimination of the code duplication through instantiation and a reduction in code size. Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size. Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. For RISCV, which today uses a single `uint64_t` type, this results in a 3.7% increase in code size (expected as we instantiate the code 3 times now). Actual measured sizes are as follows: ``` Baseline commit: 72c04bb882ad70230bce309c3013d9cc2c99e9a7 Configuration: Ubuntu clang version 18.1.3, release build with asserts disabled. AMDGPU Before After Change ====================================================== .text 612327 275607 55% reduction .rodata 369728 351336 5% reduction RISCV: ====================================================== .text 47407 49187 3.7% increase .rodata 35768 35839 0.1% increase ```
2025-07-21	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)	Changpeng Fang	1	-0/+1
	Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-11	[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282)	Stanislav Mekhanoshin	1	-0/+1

2025-07-09	[AMDGPU] gfx1250: MC support for 64-bit literals (#147861)	Stanislav Mekhanoshin	1	-0/+1

2025-07-09	[NFC][TableGen] Change DecoderEmitter `insertBits` to use integer types only ↵	Rahul Joshi	1	-17/+1
	(#147613) The `insertBits` templated function generated by DecoderEmitter is called with variable `tmp` of type `TmpType` which is: ``` using TmpType = std::conditional_t<std::is_integral<InsnType>::value, InsnType, uint64_t>; ``` That is, `TmpType` is always an integral type. Change the generated `insertBits` to be valid only for integer types, and eliminate the unused `insertBits` function from `DecoderUInt128` in AMDGPUDisassembler.h Additionally, drop some of the requirements `InsnType` must support as they no longer seem to be required.
2025-06-21	[AMDGPU] Rename call instructions from b64 to i64 (#145103)	Stanislav Mekhanoshin	1	-0/+1
	These get renamed in gfx1250 and on from B64 to I64: S_CALL_I64 S_GET_PC_I64 S_RFE_I64 S_SET_PC_I64 S_SWAP_PC_I64
2025-05-09	[AMDGPU][NFC] Remove _DEFERRED operands. (#139123)	Ivan Kosarev	1	-1/+1
	All immediates are deferred now.
2025-05-08	[AMDGPU][NFC] Get rid of OPW constants. (#139074)	Ivan Kosarev	1	-28/+7
	We can infer the widths from register classes and represent them as numbers.
2025-05-08	[AMDGPU][Disassembler][NFCI] Always defer immediate operands. (#138885)	Ivan Kosarev	1	-13/+5
	Removes the need to parameterise decoders with OperandSemantics, ImmWidth and MandatoryLiteral. Likely allows further simplification of handling _DEFERRED immediates. Tested to work downstream.
2025-04-18	[LLVM][TableGen] Move DecoderEmitter output to anonymous namespace (#136214)	Rahul Joshi	1	-32/+2
	- Move the code generated by DecoderEmitter to anonymous namespace. - Move AMDGPU's usage of this code from header file to .cpp file. Note, we get build errors like "call to function 'decodeInstruction' that is neither visible in the template definition nor found by argument-dependent lookup" if we do not change AMDGPU.
2025-01-14	[AMDGPU][True16][MC] true16 for v_cmp_lt_f16 (#122499)	Brox Chen	1	-0/+1
	True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.
2025-01-03	[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)	Jun Wang	1	-0/+1
	For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem while ensuring null cannot be used as S#, T#, or V#.
2024-11-25	AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 of gfx950 (#117590)	Matt Arsenault	1	-0/+1
	Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-21	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723)	Matt Arsenault	1	-0/+1
	These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.
2024-06-18	[AMDGPU][MC] Support UC_VERSION_* constants. (#95618)	Ivan Kosarev	1	-0/+7
	Our other tools support them, so we want them in LLVM assembler/disassembler too.
2024-04-18	[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors (#87400)	Emma Pilkington	1	-12/+13
	These mostly are checking for various reserved bits being set. The diagnostics for gpu-dependent reserved bits have a bit more context since they seem like the most likely ones to be observed in practice. This commit also improves the error handling mechanism for MCDisassembler::onSymbolStart(). Previously it had a comment stream parameter that was just being ignored by llvm-objdump, now it returns errors using Expected<T>.
2024-02-26	[AMDGPU] Only try DecoderTables for the current subtarget. NFCI. (#82992)	Jay Foad	1	-0/+1
	Speed up disassembly by only calling tryDecodeInst for DecoderTables that make sense for the current subtarget. This gives a 1.3x speed-up on check-llvm-mc-disassembler-amdgpu in my Release+Asserts build.
2024-02-23	[AMDGPU] Simplify AMDGPUDisassembler::getInstruction by removing Res. (#82775)	Jay Foad	1	-9/+9
	Remove all the code that set and tested Res. Change all convert* functions to return void since none of them can fail. getInstruction only has one main point of failure, after all calls to tryDecodeInst have failed.
2024-02-22	[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379)	Jay Foad	1	-0/+1
	Split Dpp8FI and Dpp16FI into two different operands sharing an AsmOperandClass. They are parsed and rendered identically as fi:1 but the encoding is different: for DPP16 FI is a single bit, but for DPP8 it uses two different special values in the src0 field. Having a dedicated decoder for Dpp8FI allows it to reject other (non-special) src0 values so that AMDGPUDisassembler::getInstruction no longer needs to call isValidDPP8 to do post hoc validation of decoded DPP8 instructions.
2024-02-19	[AMDGPU] Fix decoder for BF16 inline constants (#82276)	Stanislav Mekhanoshin	1	-8/+13
	Fix #82039.
2024-02-08	[AMDGPU][True16] Support VOP3 source DPP operands. (#80892)	Ivan Kosarev	1	-0/+1

2024-02-01	[llvm-objdump][AMDGPU] Pass ELF ABIVersion through disassembler (#78907)	Emma Pilkington	1	-0/+3
	Admittedly, its a bit ugly to pass the ABIVersion through onSymbolStart but I'm not sure what a better place for it would be.
2023-12-13	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)	Mariusz Sikora	1	-0/+1
	Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-04	[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062)	Mirko Brkušanin	1	-0/+1

2023-10-12	[AMDGPU] Change the representation of double literals in operands (#68740)	Stanislav Mekhanoshin	1	-4/+5
	A 64-bit literal can be used as a 32-bit zero or sign extended operand. In case of double zeroes are added to the low 32 bits. Currently asm parser stores only high 32 bits of a double into an operand. To support codegen as requested by the https://github.com/llvm/llvm-project/issues/67781 we need to change the representation to store a full 64-bit value so that codegen can simply add immediates to an instruction. There is some code to support compatibility with existing tests and asm kernels. We allow to use short hex strings to represent only a high 32 bit of a double value as a valid literal.
2023-09-25	[AMDGPU][NFC] Add True16 operand definitions.	Ivan Kosarev	1	-0/+5
	Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D156103
2023-09-23	Reapply "[AMDGPU] Introduce real and keep fake True16 instructions."	Ivan Kosarev	1	-0/+11
	Reverts 6cb3866b1ce9d835402e414049478cea82427cf1. Analysis of failures on buildbots with expensive checks enabled showed that the problem was triggered by changes in another commit, 469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug addressed in #67245.
2023-09-22	Revert "[AMDGPU] Introduce real and keep fake True16 instructions."	Ivan Kosarev	1	-11/+0
	This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to failures on expensive checks.
2023-09-22	[AMDGPU] Introduce real and keep fake True16 instructions.	Ivan Kosarev	1	-0/+11
	The existing fake True16 instructions using 32-bit VGPRs are supposed to co-exist with real ones until all the necessary True16 functionality is implemented and relevant tests are updated. Reviewed By: arsenm, Joe_Nash Differential Revision: https://reviews.llvm.org/D156101
2023-09-19	[AMDGPU] Add ASM and MC updates for preloading kernargs	Austin Kerbow	1	-0/+1
	Add assembler directives for preloading kernel arguments that correspond to new fields in the kernel descriptor for the length and offset of arguments that will be placed in SGPRs prior to kernel launch. Alignment of the arguments in SGPRs is equivalent to the kernarg segment when accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated directly after other user SGPRs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159459
2023-07-06	[AMDGPU] Improve assembler + disassembler handling of kernel descriptors	Scott Linder	1	-0/+3
	* Relax the AsmParser to accept `.amdhsa_wavefront_size32 0` when the `.amdhsa_shared_vgpr_count` directive is present. * Teach the KD disassembler to respect the setting of KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32 when calculating the value of `.amdhsa_next_free_vgpr`. * Teach the KD disassembler to disassemble COMPUTE_PGM_RSRC3 for gfx90a and gfx10+. * Include "pseudo directive" comments for gfx10 fields which are not controlled by any assembler directive. * Fix disassembleObject failure diagnostic in llvm-objdump to not hard-code a comment string, and to follow the convention of not capitalizing the first sentence. Reviewed By: rochauha Differential Revision: https://reviews.llvm.org/D128014
2023-06-29	[NFC][AMDGPU] Refactor AMDGPUDisassembler	Scott Linder	1	-0/+7
	Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler. Use split-file to simplify and extend existing kernel-descriptor disassembly tests. Add a comment to AMDHSAKernelDescriptor.h, as at least one small set towards keeping all kernel-descriptor sensitive code in sync. Reviewed By: MaskRay, kzhuravl, arsenm Differential Revision: https://reviews.llvm.org/D130105
2023-06-25	[llvm] Add missing StringExtras.h includes	Elliot Goodrich	1	-1/+2
	In preparation for removing the `#include "llvm/ADT/StringExtras.h"` from the header to source file of `llvm/Support/Error.h`, first add in all the missing includes that were previously included transitively through this header.
2023-04-26	[AMDGPU][Disassembler] Fix a spurious error message in an instruction comment.	Ivan Kosarev	1	-2/+13
	The patch prevents pollution of instruction comments with error messages generated during unsuccessful decoding attempts. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D149049
2023-02-01	AMDGPU/MC: Refactor decoders. Rework decoders for float immediates	Petar Avramovic	1	-56/+7
	decodeFPImmed creates immediate operand using register operand width, but size of created immediate should correspond to OperandType for RegisterOperand. e.g. OPW128 could be used for RegisterOperands that use v2f64 v4f32 and v8f16. Each RegisterOperands would have different OperandType and require that immediate is decoded using 64, 32 and 16 bit immediate respectively. decodeOperand_<RegClass> only provides width for register decoding, introduce decodeOperand_<RegClass>_Imm<ImmWidth> that also provides width for immediate decoding. Refactor RegisterOperands: - decoders get _Imm<ImmWidth> suffix in some cases - removed unused RegisterOperands defined via multiclass - use different RegisterOperand in a few places, new RegisterOperand's decoder corresponds to the number of bits used for operand's encoding Refactor decoder functions: - add asserts for the size of encoding that will be decoded - regroup them according to the method of decoding decodeOperand_<RegClass> (register only, no immediate) decoders can now create immediate of consistent size, use it for better diagnostic of 'invalid immediate'. Differential Revision: https://reviews.llvm.org/D142636
2022-12-04	[MC] llvm::Optional => std::optional	Fangrui Song	1	-4/+3
	https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-29	[AMDGPU] Add support for new LLVM vector types	Mateja Marjanovic	1	-0/+16
	Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384. Differential Revision: https://reviews.llvm.org/D138205
2022-10-07	[AMDGPU][MC][GFX11] Correct v_fmac_.*_e64_dpp	Dmitry Preobrazhensky	1	-0/+3
	Differential Revision: https://reviews.llvm.org/D134961
2022-09-21	Revert "[NFC][AMDGPU] Refactor AMDGPUDisassembler"	Scott Linder	1	-7/+0
	This reverts commit f5831514612cd9e014e4fc7455b75411531fe6e1.
2022-09-20	[NFC][AMDGPU] Refactor AMDGPUDisassembler	Scott Linder	1	-0/+7
	Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler. Use lit.local.cfg substitutions and more idiomatic use of split-file to simplify and extend existing kernel-descriptor disassembly tests. Add a comment to AMDHSAKernelDescriptor.h, as at least one small set towards keeping all kernel-descriptor sensitive code in sync. Reviewed By: kzhuravl, arsenm Differential Revision: https://reviews.llvm.org/D130105
2022-09-20	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C	Joe Nash	1	-0/+1
	Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723
2022-07-15	[AMDGPU][MC][GFX11] Correct disassembly of *_e64_dpp opcodes which support ↵	Dmitry Preobrazhensky	1	-0/+1
	op_sel These opcodes cannot be disassembled because op_sel operand is missing - it must be added manually. See https://github.com/llvm/llvm-project/issues/56512 for detailed issue analysis. Differential Revision: https://reviews.llvm.org/D129637
2022-06-24	[AMDGPU] gfx11 VOPD instructions MC support	Joe Nash	1	-0/+1
	VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only. A VOPD instruction is constituted of an X component (for which there are 13 possible opcodes) and a Y component (for which there are the 13 X opcodes plus 3 more). Most of the complexity in defining and parsing a VOPD operation arises from the possible different total numbers of operands and deferred parsing of certain operands depending on the constituent X and Y opcodes. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D128218
2022-06-09	[AMDGPU] gfx11 VOPC instructions	Joe Nash	1	-0/+1
	Supports encoding existing instrutions on gfx11 and MC support for the new VOPC dpp instructions. Patch 19/N for upstreaming of AMDGPU gfx11 architecture Depends on D126978 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126989
2022-06-08	[AMDGPU] gfx11 VOP3P instruction MC support	Joe Nash	1	-0/+1
	Includes dpp versions of VOP3P instructions. Patch 18/N for upstreaming of AMDGPU gfx11 architecture Depends on D126917 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126978
2022-06-07	Reland [AMDGPU] gfx11 vop3dpp instructions	Joe Nash	1	-0/+55
	There was an issue with encoding wide (>64 bit) instructions on BigEndian hosts, which is fixed in D127195. Therefore reland this. gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Differential Revision: https://reviews.llvm.org/D126483
2022-06-06	Revert "[AMDGPU] gfx11 vop3dpp instructions"	Joe Nash	1	-55/+0
	This reverts commit 99a83b1286748501e0ccf199a582dc3ec5451ef5.
2022-06-06	[AMDGPU] gfx11 vop3dpp instructions	Joe Nash	1	-0/+55
	gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Depends on D126475 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126483