riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
6 days	[AMDGPU][AsmParser] Introduce MC representation for lit() and lit64(). (#160316)	Ivan Kosarev	2	-30/+82
	And rework the lit64() support to use it. The rules for when to add lit64() can be simplified and improved. In this change, however, we just follow the existing conventions on the assembler and disassembler sides. In codegen we do not (and normally should not need to) add explicit lit() and lit64() modifiers, so the codegen tests lose them. The change is an NFCI otherwise. Simplifies printing operands.
2025-09-11	AMDGPU: Remove most manual AVLdSt decoder code (#157861)	Matt Arsenault	1	-42/+0
	This was additional hacking around using incorrect register class constraints for paired data operands. I'm not really sure why we need any of what's left. In particular the IS_VGPR special case seems backwards from how the encoding works.
2025-09-10	Revert "[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588)" (#157639)	Pierre van Houtryve	1	-3/+0
	This reverts commit be17791f2624f22b3ed24a2539406164a379125d. This is not necessary for gfx1250 anymore.
2025-09-04	[AMDGPU] Ensure positive InstOffset for buffer operations (#145504)	Aleksandar Spasojevic	2	-0/+29
	GFX12+ buffer ops require positive InstOffset per AMD hardware spec. Modified assembler/disassembler to reject negative buffer offsets.
2025-09-03	[AMDGPU] Define 1024 VGPRs on gfx1250 (#156765)	Stanislav Mekhanoshin	1	-4/+26
	This is a baseline support, it is not useable yet.
2025-09-02	[MC][DecoderEmitter] Fix build warning: explicit specialization cannot have ↵	Rahul Joshi	1	-6/+6
	a storage class (#156375) Move `InsnBitWidth` template into anonymous namespace in the generated code and move template specialization of `InsnBitWidth` to anonymous namespace as well, and drop `static` for them. This makes `InsnBitWidth` completely private to each target and fixes the "explicit specialization cannot have a storage class" warning as well as any potential linker errors if `InsnBitWidth` is kept in the `llvm::MCD` namespace.
2025-09-01	[AMDGPU, RISCV] Fix warnings	Kazu Hirata	1	-4/+4
	This patch fixes: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:451:13: error: explicit specialization cannot have a storage class [-Werror,-Wexplicit-specialization-storage-class] llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:452:13: error: explicit specialization cannot have a storage class [-Werror,-Wexplicit-specialization-storage-class] llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:454:1: error: explicit specialization cannot have a storage class [-Werror,-Wexplicit-specialization-storage-class] llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:456:1: error: explicit specialization cannot have a storage class [-Werror,-Wexplicit-specialization-storage-class] While I am at it, this patch changes the storage types of InsnBitWidth specilization to "inline constexpr" to avoid linker errors.
2025-09-01	[LLVM][MC][DecoderEmitter] Add support to specialize decoder per bitwidth ↵	Rahul Joshi	2	-53/+22
	(#154865) This change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code. Current state: Currently, the code generated by the decoder emitter consists of two key functions: `decodeInstruction` which is the entry point into the generated code and `decodeToMCInst` which is invoked when a decode op is reached while traversing through the decoder table. Both functions are templated on `InsnType` which is the raw instruction bits that are supplied to `decodeInstruction`. Several backends call `decodeInstruction` with different `InsnType` types, leading to several template instantiations of these functions in the final code. As an example, AMDGPU instantiates this function with type `DecoderUInt128` type for decoding 96/128-bit instructions, `uint64_t` for decoding 64-bit instructions, and `uint32_t` for decoding 32-bit instructions. Since there is just one `decodeToMCInst` in the generated code, it has code that handles decoding for all instruction sizes. However, the decoders emitted for different instructions sizes rarely have any intersection with each other. That means, in the AMDGPU case, the instantiation with InsnType == DecoderUInt128 has decoder code for 32/64-bit instructions that is never exercised. Conversely, the instantiation with InsnType == uint64_t has decoder code for 128/96/32-bit instructions that is never exercised. This leads to unnecessary dead code in the generated disassembler binary (that the compiler cannot eliminate by itself). New state: With this change, we introduce an option `specialize-decoders-per-bitwidth`. Under this mode, the DecoderEmitter will generate several versions of `decodeToMCInst` function, one for each bitwidth. The code is still templated, but will require backends to specify, for each `InsnType` used, the bitwidth of the instruction that the type is used to represent using a type-trait `InsnBitWidth`. This will enable the templated code to choose the right variant of `decodeToMCInst`. Under this mode, a particular instantiation will only end up instantiating a single variant of `decodeToMCInst` generated and that will include only those decoders that are applicable to a single bitwidth, resulting in elimination of the code duplication through instantiation and a reduction in code size. Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size. Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. For RISCV, which today uses a single `uint64_t` type, this results in a 3.7% increase in code size (expected as we instantiate the code 3 times now). Actual measured sizes are as follows: ``` Baseline commit: 72c04bb882ad70230bce309c3013d9cc2c99e9a7 Configuration: Ubuntu clang version 18.1.3, release build with asserts disabled. AMDGPU Before After Change ====================================================== .text 612327 275607 55% reduction .rodata 369728 351336 5% reduction RISCV: ====================================================== .text 47407 49187 3.7% increase .rodata 35768 35839 0.1% increase ```
2025-08-22	[AMDGPU] gfx1250 kernel descriptor update (#155008)	Stanislav Mekhanoshin	1	-12/+33

2025-08-21	[NFC][MC][Decoder] Extract fixed pieces of decoder code into new header file ↵	Rahul Joshi	1	-1/+2
	(#154802) Extract fixed functions generated by decoder emitter into a new MCDecoder.h header.
2025-08-19	[AMDGPU] upstream barrier count reporting part1 (#154409)	Gang Chen	1	-2/+12

2025-08-05	[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203)	Stanislav Mekhanoshin	1	-0/+3

2025-08-04	[AMDGPU] Add gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 MC support (#152014)	Stanislav Mekhanoshin	1	-0/+7
	This adds new VOP3PX2e encoding
2025-07-29	[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588)	Pierre van Houtryve	1	-0/+3
	Determines whether we can use `SCOPE_CU` stores (on by default), or whether all stores must be done at `SCOPE_SE` minimum.
2025-07-21	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)	Changpeng Fang	2	-1/+47
	Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-11	[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282)	Stanislav Mekhanoshin	2	-0/+28

2025-07-09	[AMDGPU] gfx1250: MC support for 64-bit literals (#147861)	Stanislav Mekhanoshin	2	-0/+19

2025-07-09	[NFC][TableGen] Change DecoderEmitter `insertBits` to use integer types only ↵	Rahul Joshi	1	-17/+1
	(#147613) The `insertBits` templated function generated by DecoderEmitter is called with variable `tmp` of type `TmpType` which is: ``` using TmpType = std::conditional_t<std::is_integral<InsnType>::value, InsnType, uint64_t>; ``` That is, `TmpType` is always an integral type. Change the generated `insertBits` to be valid only for integer types, and eliminate the unused `insertBits` function from `DecoderUInt128` in AMDGPUDisassembler.h Additionally, drop some of the requirements `InsnType` must support as they no longer seem to be required.
2025-06-25	[AMDGPU] Add the support for `v_cvt_f32_bf16` on gfx1250 (#145632)	Shilei Tian	1	-5/+9
	Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-06-24	[AMDGPU] Support v_lshl_add_u64 in gfx1250 (#145591)	Stanislav Mekhanoshin	1	-0/+5
	It also brings in some DPP changes needed to define it.
2025-06-23	AMDGPU: Use reportFatalUsageError for unsupported disassembly error (#145264)	Matt Arsenault	1	-1/+1

2025-06-21	[AMDGPU] Rename call instructions from b64 to i64 (#145103)	Stanislav Mekhanoshin	2	-0/+9
	These get renamed in gfx1250 and on from B64 to I64: S_CALL_I64 S_GET_PC_I64 S_RFE_I64 S_SET_PC_I64 S_SWAP_PC_I64
2025-06-17	[llvm] annotate interfaces in llvm/Target for DLL export (#143615)	Andrew Rogers	1	-1/+3
	## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Target` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). A sub-set of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The bulk of this change is manual additions of `LLVM_ABI` to `LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target. Adding `LLVM_ABI` to the function implementation is required here because they do not `#include "llvm/Support/TargetSelect.h"`, which contains the declarations for this functions and was already updated with `LLVM_ABI` in a previous patch. I considered patching these files with `#include "llvm/Support/TargetSelect.h"` instead, but since TargetSelect.h is a large file with a bunch of preprocessor x-macro stuff in it I was concerned it would unnecessarily impact compile times. In addition, a number of unit tests under llvm/unittests/Target required additional dependencies to make them build correctly against the LLVM DLL on Windows using MSVC. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang
2025-05-09	[AMDGPU][NFC] Remove _DEFERRED operands. (#139123)	Ivan Kosarev	2	-56/+8
	All immediates are deferred now.
2025-05-08	[AMDGPU][NFC] Get rid of OPW constants. (#139074)	Ivan Kosarev	2	-164/+163
	We can infer the widths from register classes and represent them as numbers.
2025-05-08	[AMDGPU][Disassembler][NFCI] Always defer immediate operands. (#138885)	Ivan Kosarev	2	-122/+124
	Removes the need to parameterise decoders with OperandSemantics, ImmWidth and MandatoryLiteral. Likely allows further simplification of handling _DEFERRED immediates. Tested to work downstream.
2025-04-18	[LLVM][TableGen] Move DecoderEmitter output to anonymous namespace (#136214)	Rahul Joshi	2	-32/+42
	- Move the code generated by DecoderEmitter to anonymous namespace. - Move AMDGPU's usage of this code from header file to .cpp file. Note, we get build errors like "call to function 'decodeInstruction' that is neither visible in the template definition nor found by argument-dependent lookup" if we do not change AMDGPU.
2025-03-19	[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038)	Mariusz Sikora	1	-0/+1
	- Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and image_bvh_dual_intersect_ray machine instruction. - Add llvm_v10i32_ty and llvm_v10f32_ty --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2025-03-08	[AMDGPU][MC] Don't crash on decoding invalid SOP1 ssrc0 operands. (#130302)	Ivan Kosarev	1	-15/+19
	These are encoded as 8-bit fields.
2025-02-27	[AMDGPU][MC] Disassembler warning for v_cmpx instructions (#127925)	Jun Wang	1	-1/+13
	For GFX10+ the destination reg of v_cmpx instructions is implicitly EXEC, which is encoded as 0x7E. However, the disassembler does not check this field, thus allowing any value. With this patch, if the field is not EXEC a warning is issued.
2025-02-26	[AMDGPU] Do not allow M0 as v_readfirstlane_b32 dst (#128851)	Pierre van Houtryve	1	-0/+1
	M0 can only be written to by the SALU, so `v_readfirstlane_b32 m0` is effectively useless. Represent this by restricting the dest RC of that instruction to `SReg_32_XM0` which excludes M0. There is a lot of test changes due to the register class changing, but most changes are trivial. In some cases, an extra register and `s_mov_b32` is needed. Fixes SWDEV-513269
2025-02-12	[TableGen] Emit OpName as an enum class instead of a namespace (#125313)	Rahul Joshi	1	-19/+21
	- Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).
2025-02-11	[AMDGPU] Create new directive .amdhsa_inst_pref_size (#126622)	Stanislav Mekhanoshin	1	-4/+4
	The field INST_PREF_SIZE is available since gfx11.
2025-01-14	[AMDGPU][True16][MC] true16 for v_cmp_lt_f16 (#122499)	Brox Chen	2	-2/+18
	True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.
2025-01-03	[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)	Jun Wang	2	-0/+26
	For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem while ensuring null cannot be used as S#, T#, or V#.
2024-11-25	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)	Matt Arsenault	1	-0/+8
	The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25	AMDGPU: Support v_cvt_scalef32_pk32_{bf\|f}6_{bf\|fp}16 for gfx950 (#117592)	Matt Arsenault	1	-0/+1
	Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-25	AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 of gfx950 (#117590)	Matt Arsenault	2	-0/+2
	Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-23	AMDGPU: Remove wavefrontsize64 feature from dummy target (#117410)	Matt Arsenault	1	-8/+8
	This is a refinement for the existing hack. With this, the default target will have neither wavefrontsize feature present, unless it was explicitly specified. That is, getWavefrontSize() == 64 no longer implies +wavefrontsize64. getWavefrontSize() == 32 does imply +wavefrontsize32. Continue to assume the value is 64 with no wavesize feature. This maintains the codegenable property without any code that directly cares about the wavesize needing to worry about it. Introduce an isWaveSizeKnown helper to check if we know the wavesize is accurate based on having one of the features explicitly set, or a known target-cpu. I'm not sure what's going on in wave_any.s. It's testing what happens when both wavesizes are enabled, but this is treated as an error in codegen. We now treat wave32 as the winning case, so some cases that were previously printed as vcc are now vcc_lo.
2024-11-23	AMDGPU: Move default wavesize hack for disassembler (#117422)	Matt Arsenault	1	-18/+2
	You cannot adjust the disassembler's subtarget. llvm-mc passes the originally constructed MCSubtargetInfo around, rather than querying the pointer in the disassembler instance.
2024-11-21	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723)	Matt Arsenault	2	-0/+76
	These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.
2024-11-20	[AMDGPU][MC][True16] Support VOP2 instructions with true16 format (#115233)	Brox Chen	1	-0/+19
	Support true16 format for VOP2 instructions in MC This patch updates the true16 and fake16 vop_profile for the following instructions and update the asm/dasm tests: v_fmac_f16 v_fmamk_f16 v_fmaak_f16 It seems vop2_t16_promote.s files are not yet updated with true16 flag in the previous batch update. It will be updated seperately
2024-11-14	[AMDGPU][True16][MC] VINTERP instructions supporting true16/fake16 (#113634)	Brox Chen	1	-8/+30
	Update VInterp instructions with true16 and fake16 formats. This patch includes instructions: v_interp_p10_f16_f32 v_interp_p2_f16_f32 v_interp_p10_rtz_f16_f32 v_interp_p2_rtz_f16_f32 dasm test vinterp-fake16.txt is removed and the testline are merged into vinterp.txt which handles both true16/fake16 cases
2024-10-03	[AMDGPU] Qualify auto. NFC. (#110878)	Jay Foad	1	-18/+18
	Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)
2024-09-28	[AMDGPU] Use MCRegister. NFC	Craig Topper	1	-7/+7

2024-09-24	[AMDGPU][MC] Disallow null as saddr in flat instructions (#101730)	Jun Wang	1	-0/+1
	Some flat instructions have an saddr operand. When 'null' is provided as saddr, it may have the same encoding as another instruction. For example, the instructions 'global_atomic_add v1, v2, null' and 'global_atomic_add v[1:2], v2, off' have the same encoding. This patch disallows having null as saddr.
2024-09-20	[AMDGPU] Do not use APInt for simple 64-bit arithmetic. NFC. (#109414)	Jay Foad	1	-4/+2

2024-09-11	[AMDGPU][True16][MC] 16bit vsrc and vdst support in MC (#104510)	Brox Chen	1	-8/+14
	This is a large patch includes the MC level support for V_CVT_F16_F32, V_CVT_F32_F16 and V_LDEXP_F16 in true16 format. This patch includes the asm/disasm changes to encode/decode the 16bit vsrc, vdst and src modifieres for vop and dpp format. This patch is a dependency for many 16 bit instructions while only three instructions are updated to make it easier to review. There will be another patch to support these three instructions in the codeGen level, this patch just replaces these two instructions with its fake16 format.
2024-08-26	[MC] Update MCOperand::getReg/setReg/createReg and MCInstBuilder::addReg to ↵	Craig Topper	1	-1/+1
	use MCRegister. (#106015) Replace unsigned with MCRegister. Update some ternary operators that started giving errors.
2024-07-17	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298)	Jay Foad	1	-12/+10