| Age | Commit message (Collapse) | Author | Files | Lines |
|
Add `DX10ClampAndIEEEMode` feature and set it for every subtarget prior
to gfx1170
|
|
|
|
|
|
Introduce two new subtarget features:
- WMMA256bInsts for GFX11 WMMA instructions and
- WMMA128bInsts for GFX1170 and GFX12 WMMA and SWMMAC instructions
Some WMMA instructions have changed from GFX 11.0 to GFX 11.7 so new
Real versions were added with "_gfx1170" suffix. For consistency all
WMMA and SWMMAC GFX11.7 instructions use this suffix.
To resolve decoding issues between different formats for some WMMA
instructions between GFX 11 and GFX 11.7, new decoding tables were
added.
|
|
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
|
|
For now list of features is based on gfx12 and gfx1250
---------
Co-authored-by: Jay Foad <jay.foad@amd.com>
|
|
This PR handles`v_pk_fmac_f16` inline constant encoding/decoding
differences between pre-GFX11 and GFX11+ hardware.
- Pre-GFX11: fp16 inline constants produce `(f16, 0)` - value in low 16
bits, zero in high.
- GFX11+: fp16 inline constants are duplicated to both halves `(f16,
f16)`.
Fixes #94116.
|
|
Use MCRegister instead of MCPhysReg or use MCRegister::id().
|
|
In GFX10+, the v_cmpx_* instructions use EXEC as the implicit dst and do
not have explicit dst. Therefore a warning is issued by the disassembler
when the dst is not EXEC. However, in GFX9 and earlier, those
instructions have EXEC as the implicit dst as well as an explicit dst.
The aforementioned warning should not be issued.
|
|
Fixes round-tripping where literals used to be reassembled into
inline constants.
Also fix the %extract-encodings substitution in lit tests to emit
each instruction code once and not twice.
Eliminate the Literal64 field.
|
|
There should normally be no need to generate implicit lit64()
modifiers on the assembler side. It's the encoder's responsibility
to recognise literals that are implicitly 64 bits wide.
The exceptions are where we rewrite floating-point operand values
as integer ones, which would not be assembled back to the original
values unless wrapped into lit64().
Respect explicit lit() modifiers for non-inline values as
necessary to avoid regressions in MC tests. This change still
doesn't prevent use of inline constants where lit()/lit64 is
specified; subject to a separate patch.
On disassembling, only create lit64() operands where necessary for
correct round-tripping.
Add round-tripping tests where useful and feasible.
|
|
(#158272)
This removes special case processing in TargetInstrInfo::getRegClass to
fixup register operands which depending on the subtarget support AGPRs,
or require even aligned registers.
This regresses assembler diagnostics, which currently work by hackily
accepting invalid cases and then post-rejecting a validly parsed
instruction.
On the plus side this now emits a comment when disassembling unaligned
registers for targets with the alignment requirement.
|
|
instead of hard code zero (#161771)
There is no test change at this moment because we don't have a target
that has this feature by default yet.
|
|
And rework the lit64() support to use it.
The rules for when to add lit64() can be simplified and
improved. In this change, however, we just follow the existing
conventions on the assembler and disassembler sides.
In codegen we do not (and normally should not need to) add explicit
lit() and lit64() modifiers, so the codegen tests lose them. The change
is an NFCI otherwise.
Simplifies printing operands.
|
|
This was additional hacking around using incorrect register class
constraints for paired data operands. I'm not really sure why we
need any of what's left. In particular the IS_VGPR special case
seems backwards from how the encoding works.
|
|
This reverts commit be17791f2624f22b3ed24a2539406164a379125d.
This is not necessary for gfx1250 anymore.
|
|
GFX12+ buffer ops require positive InstOffset per AMD hardware spec.
Modified assembler/disassembler to reject negative buffer offsets.
|
|
This is a baseline support, it is not useable yet.
|
|
a storage class (#156375)
Move `InsnBitWidth` template into anonymous namespace in the generated
code and move template specialization of `InsnBitWidth` to anonymous
namespace as well, and drop `static` for them. This makes `InsnBitWidth`
completely private to each target and fixes the "explicit specialization
cannot have a storage class" warning as well as any potential linker
errors if `InsnBitWidth` is kept in the `llvm::MCD` namespace.
|
|
This patch fixes:
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:451:13:
error: explicit specialization cannot have a storage class
[-Werror,-Wexplicit-specialization-storage-class]
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:452:13:
error: explicit specialization cannot have a storage class
[-Werror,-Wexplicit-specialization-storage-class]
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:454:1:
error: explicit specialization cannot have a storage class
[-Werror,-Wexplicit-specialization-storage-class]
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:456:1:
error: explicit specialization cannot have a storage class
[-Werror,-Wexplicit-specialization-storage-class]
While I am at it, this patch changes the storage types of InsnBitWidth
specilization to "inline constexpr" to avoid linker errors.
|
|
(#154865)
This change adds an option to specialize decoders per bitwidth, which
can help reduce the (compiled) code size of the decoder code.
**Current state**:
Currently, the code generated by the decoder emitter consists of two key
functions: `decodeInstruction` which is the entry point into the
generated code and `decodeToMCInst` which is invoked when a decode op is
reached while traversing through the decoder table. Both functions are
templated on `InsnType` which is the raw instruction bits that are
supplied to `decodeInstruction`.
Several backends call `decodeInstruction` with different `InsnType`
types, leading to several template instantiations of these functions in
the final code. As an example, AMDGPU instantiates this function with
type `DecoderUInt128` type for decoding 96/128-bit instructions,
`uint64_t` for decoding 64-bit instructions, and `uint32_t` for decoding
32-bit instructions. Since there is just one `decodeToMCInst` in the
generated code, it has code that handles decoding for *all* instruction
sizes. However, the decoders emitted for different instructions sizes
rarely have any intersection with each other. That means, in the AMDGPU
case, the instantiation with InsnType == DecoderUInt128 has decoder code
for 32/64-bit instructions that is *never exercised*. Conversely, the
instantiation with InsnType == uint64_t has decoder code for
128/96/32-bit instructions that is never exercised. This leads to
unnecessary dead code in the generated disassembler binary (that the
compiler cannot eliminate by itself).
**New state**:
With this change, we introduce an option
`specialize-decoders-per-bitwidth`. Under this mode, the DecoderEmitter
will generate several versions of `decodeToMCInst` function, one for
each bitwidth. The code is still templated, but will require backends to
specify, for each `InsnType` used, the bitwidth of the instruction that
the type is used to represent using a type-trait `InsnBitWidth`. This
will enable the templated code to choose the right variant of
`decodeToMCInst`. Under this mode, a particular instantiation will only
end up instantiating a single variant of `decodeToMCInst` generated and
that will include only those decoders that are applicable to a single
bitwidth, resulting in elimination of the code duplication through
instantiation and a reduction in code size.
Additionally, under this mode, decoders are uniqued only within a given
bitwidth (as opposed to across all bitwidths without this option), so
the decoder index values assigned are smaller, and consume less bytes in
their ULEB128 encoding. As a result, the generated decoder tables can
also reduce in size.
Adopt this feature for the AMDGPU and RISCV backend. In a release build,
this results in a net 55% reduction in the .text size of
libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. For
RISCV, which today uses a single `uint64_t` type, this results in a 3.7%
increase in code size (expected as we instantiate the code 3 times now).
Actual measured sizes are as follows:
```
Baseline commit: 72c04bb882ad70230bce309c3013d9cc2c99e9a7
Configuration: Ubuntu clang version 18.1.3, release build with asserts disabled.
AMDGPU Before After Change
======================================================
.text 612327 275607 55% reduction
.rodata 369728 351336 5% reduction
RISCV:
======================================================
.text 47407 49187 3.7% increase
.rodata 35768 35839 0.1% increase
```
|
|
|
|
(#154802)
Extract fixed functions generated by decoder emitter into a new
MCDecoder.h header.
|
|
|
|
|
|
This adds new VOP3PX2e encoding
|
|
Determines whether we can use `SCOPE_CU` stores (on by default), or
whether all stores must be done at `SCOPE_SE` minimum.
|
|
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
|
|
|
|
|
|
(#147613)
The `insertBits` templated function generated by DecoderEmitter is
called with variable `tmp` of type `TmpType` which is:
```
using TmpType = std::conditional_t<std::is_integral<InsnType>::value, InsnType, uint64_t>;
```
That is, `TmpType` is always an integral type. Change the generated
`insertBits` to be valid only for integer types, and eliminate the
unused `insertBits` function from `DecoderUInt128` in
AMDGPUDisassembler.h
Additionally, drop some of the requirements `InsnType` must support as
they no longer seem to be required.
|
|
Co-authored-by: Shilei Tian <i@tianshilei.me>
|
|
It also brings in some DPP changes needed to define it.
|
|
|
|
These get renamed in gfx1250 and on from B64 to I64:
S_CALL_I64
S_GET_PC_I64
S_RFE_I64
S_SET_PC_I64
S_SWAP_PC_I64
|
|
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Target` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
A sub-set of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The bulk of this change is manual additions of `LLVM_ABI` to
`LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target.
Adding `LLVM_ABI` to the function implementation is required here
because they do not `#include "llvm/Support/TargetSelect.h"`, which
contains the declarations for this functions and was already updated
with `LLVM_ABI` in a previous patch. I considered patching these files
with `#include "llvm/Support/TargetSelect.h"` instead, but since
TargetSelect.h is a large file with a bunch of preprocessor x-macro
stuff in it I was concerned it would unnecessarily impact compile times.
In addition, a number of unit tests under llvm/unittests/Target required
additional dependencies to make them build correctly against the LLVM
DLL on Windows using MSVC.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
|
|
All immediates are deferred now.
|
|
We can infer the widths from register classes and represent them as
numbers.
|
|
Removes the need to parameterise decoders with OperandSemantics,
ImmWidth and MandatoryLiteral.
Likely allows further simplification of handling _DEFERRED immediates.
Tested to work downstream.
|
|
- Move the code generated by DecoderEmitter to anonymous namespace.
- Move AMDGPU's usage of this code from header file to .cpp file.
Note, we get build errors like "call to function 'decodeInstruction'
that is neither visible in the template definition nor found by
argument-dependent lookup" if we do not change AMDGPU.
|
|
- Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and
image_bvh_dual_intersect_ray machine instruction.
- Add llvm_v10i32_ty and llvm_v10f32_ty
---------
Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
|
|
These are encoded as 8-bit fields.
|
|
For GFX10+ the destination reg of v_cmpx instructions is implicitly EXEC,
which is encoded as 0x7E. However, the disassembler does not check this
field, thus allowing any value. With this patch, if the field is not
EXEC a warning is issued.
|
|
M0 can only be written to by the SALU, so `v_readfirstlane_b32 m0` is
effectively useless. Represent this by restricting the dest RC of that
instruction to `SReg_32_XM0` which excludes M0.
There is a lot of test changes due to the register class changing, but
most changes are trivial. In some cases, an extra register and
`s_mov_b32` is needed.
Fixes SWDEV-513269
|
|
- Change InstrInfoEmitter to emit OpName as an enum class
instead of an anonymous enum in the OpName namespace.
- This will help clearly distinguish between values that are
OpNames vs just operand indices and should help avoid
bugs due to confusion between the two.
- Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES.
- Emit declaration of getOperandIdx() along with the OpName
enum so it doesn't have to be repeated in various headers.
- Also updated AMDGPU, RISCV, and WebAssembly backends
to conform to the new definition of OpName (mostly
mechanical changes).
|
|
The field INST_PREF_SIZE is available since gfx11.
|
|
True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.
|
|
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.
|
|
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
|
|
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
|