aboutsummaryrefslogtreecommitdiff
path: root/llvm/utils/TableGen/DecoderEmitter.cpp
AgeCommit message (Collapse)AuthorFilesLines
7 days[TableGen][DecoderEmitter][RISCV] Always handle `bits<0>` (#159951)Sergei Barannikov1-2/+0
Previously, `bits<0>` only had effect if `ignore-non-decodable-operands` wasn't specified. Handle it even if the option was specified. This should allow for a smoother transition to the option removed. The change revealed a couple of inaccuracies in RISCV compressed instruction definitions. * `C_ADDI4SPN` has `bits<5> rs1` field, but `rs1` is not encoded. It should be `bits<0>`. * `C_ADDI16SP` has `bits<5> rd` in the base class, but it is unused since `Inst{11-7}` is overwritten with constant bits. We should instead set `rd = 2` and `Inst{11-7} = rd`. There are a couple of alternative fixes, but this one is the shortest.
9 days[TableGen][DecoderEmitter] Rework table construction/emission (#155889)Sergei Barannikov1-554/+148
### Current state We have FilterChooser class, which can be thought of as a **tree of encodings**. Tree nodes are instances of FilterChooser itself, and come in two types: * A node containing single encoding that has *constant* bits in the specified bit range, a.k.a. singleton node. * A node containing only child nodes, where each child represents a set of encodings that have the same *constant* bits in the specified bit range. Either of these nodes can have an additional child, which represents a set of encodings that have some *unknown* bits in the same bit range. As can be seen, the **data structure is very high level**. The encoding tree represented by FilterChooser is then converted into a finite-state machine (FSM), represented as **byte array**. The translation is straightforward: for each node of the tree we emit a sequence of opcodes that check encoding bits and predicates for each encoding. For a singleton node we also emit a terminal "decode" opcode. The translation is done in one go, and this has negative consequences: * We miss optimization opportunities. * We have to use "fixups" when encoding transitions in the FSM since we don't know the size of the data we want to jump over in advance. We have to emit the data first and then fix up the location of the jump. This means the fixup size has to be large enough to encode the longest jump, so **most of the transitions are encoded inefficiently**. * Finally, when converting the FSM into human readable form, we have to **decode the byte array we've just emitted**. This is also done in one go, so we **can't do any pretty printing**. ### This PR We introduce an intermediary data structure, decoder tree, that can be thought as **AST of the decoder program**. This data structure is **low level** and as such allows for optimization and analysis. It resolves all the issues listed above. We now can: * Emit more optimal opcode sequences. * Compute the size of the data to be emitted in advance, avoiding fixups. * Do pretty printing. Serialization is done by a new class, DecoderTableEmitter, which converts the AST into a FSM in **textual form**, streamed right into the output file. ### Results * The new approach immediately resulted in 12% total table size savings across all in-tree targets, without implementing any optimizations on the AST. Many tables observe ~20% size reduction. * The generated file is much more readable. * The implementation is arguably simpler and more straightforward (the diff is only +150~200 lines, which feels rather small for the benefits the change gives).
9 days[TableGen] Remove unused Target from InstructionEncoding methods (NFC) (#159833)Sergei Barannikov1-8/+5
10 daysCodeGen: Add RegisterClass by HwMode (#158269)Matt Arsenault1-3/+59
This is a generalization of the LookupPtrRegClass mechanism. AMDGPU has several use cases for swapping the register class of instruction operands based on the subtarget, but none of them really fit into the box of being pointer-like. The current system requires manual management of an arbitrary integer ID. For the AMDGPU use case, this would end up being around 40 new entries to manage. This just introduces the base infrastructure. I have ports of all the target specific usage of PointerLikeRegClass ready.
11 days[TableGen][DecoderEmitter] Sink repeated code after the switch (NFC) (#159549)Sergei Barannikov1-41/+20
11 days[TableGen][DecoderEmitter] Simplify FilterChooser::getIslands() (NFC) (#159218)Sergei Barannikov1-52/+35
Also turn the method into a static function so it can be used without an instance of the class.
12 days[TableGen][DecoderEmitter] Inline `insertBits()` (NFC) (#159353)Sergei Barannikov1-3/+7
`tmp` is always of integer type, so we can use bitwise OR and shift.
12 days[TableGen][DecoderEmitter] Merge OPC_Decode with OPC_TryDecode (#159178)Sergei Barannikov1-44/+9
OPC_Decode is a specialized OPC_TryDecode. The difference between them is that OPC_TryDecode performs a "completeness check", while OPC_Decode asserts that the check passes. The check is just a boolean test, which is nothing compared to the complexity of the decoding process, so there is no point in having a special opcode that optimizes the check.
13 days[TableGen][DecoderEmitter] Replace opcode mask with booleans (NFC) (#159113)Sergei Barannikov1-34/+27
Extracted from #155889, which removes inclusion of `MCDecoderOps.h`.
13 days[TableGen][DecoderEmitter] Change SmallSetVector to SetVector (NFC) (#159108)Sergei Barannikov1-5/+5
SmallSetVector is too optimistic, there are usually more than 16 unique decoders and predicates. Modernize `typedef` to `using` while here.
13 days[TableGen][DecoderEmitter] Inline a couple of trivial functions (NFC) (#159099)Sergei Barannikov1-29/+17
13 days[TableGen][Decoder] Make predicate/decocder generation functions return a ↵Sergei Barannikov1-27/+21
string (NFC) (#159089) These functions will see more uses in a future patch. This also resolves a FIXME.
13 days[TableGen][DecoderEmitter] Turn some methods into static functions (NFC) ↵Sergei Barannikov1-29/+19
(#158789) DecoderTableBuilder will be removed. Move out the class the methods that will remain.
13 days[TableGen][DecoderEmitter] Add a few DecoderTableInfo helpers (NFC) (#158776)Sergei Barannikov1-13/+24
13 days[NFC][DecoderEmitter] Remove unused `emitPredicateMatchAux` (#158771)Rahul Joshi1-37/+0
13 days[NFC][DecoderEmitter] Predicate generation code cleanup (#158140)Rahul Joshi1-43/+16
Eliminate `doesOpcodeNeedPredicate` and instead have `emitPredicateMatch` return true if any predicates were generated. Delegate actual predicate generation in `emitPredicateMatch` to `SubtargetFeatureInfo::emitMCPredicateCheck`. Additionally, remove the redundant parenthesis around the predicate conditions in the generated `checkDecoderPredicate` function. Note that for ARM/AMDGPU this reduces the total # of predicates generated by a few. It seems the old code would sometimes generate duplicate predicates which were identical in semantics but one had an extra pair of parentheses (i..e, `X` and `(X)`). `emitMCPredicateCheck` does not seems to have that issue.
13 days[NFC][DecoderEmitter] Code cleanup in `DecoderEmitter::emitTable` (#158014)Rahul Joshi1-108/+132
Several code cleanup changes in code to emit decoder tables: - Start comments on each line at a fixed column for readibility. - Combine repeated code to decode and emit ULEB128 into a single function. - Add helper `getDecoderOpName` to print decoder op. - Print Filter/CheckField/predicate index values with those opcodes.
14 days[TableGen] Extract InstructionEncoding class into a separate file (NFC) ↵Sergei Barannikov1-535/+14
(#158505) So that it can be used in CodeEmitterGen / VarLenCodeEmitterGen.
2025-09-12[TableGen][DecoderEmitter] Decode operands with "all zeros" encoding (#158163)Sergei Barannikov1-25/+22
Follow-up to #156358. The original change didn't take into account operands with "all zeros" encoding, now fixed.
2025-09-10[TableGen][DecoderEmitter] Report all decoding conflicts (#157847)Rahul Joshi1-4/+21
Do not exit when the first decoding conflict is encountered. Instead record the conflict and continue to report any additional decoding conflicts and exit fatally after all instructions have been processed.
2025-09-06[TableGen][DecoderEmitter] Inline reportRegion method (NFC) (#157266)Sergei Barannikov1-24/+14
2025-09-06[TableGen][DecoderEmitter] Add OPC_Scope opcode (#155580)Sergei Barannikov1-163/+100
This change introduces OPC_Scope opcode, whose only purpose is to record a continuation point to resume at if a subsequent opcode fails. Each OPC_Scope pushes an entry onto the scope stack; an entry is popped if an opcode in the scope fails. Previously, we recorded this information on several opcodes, it has been removed. A series of such opcodes often referred to the same continuation point; this information is now recorded in one place, reducing table sizes in most cases. Average reduction is 1.1%, some table observe up to 7% reduction in size. The new behavior of those opcodes is "check or leave scope". If we're in the outermost scope (scope stack is empty), they act as "check or fail". There is one opcode, OPC_FilterValueOrSkip that behaves like the old OPC_FilterValue. It is special because it acts as a case of a switch statement and has nothing to do with scopes. (If a case fails, we should try the next case instead of leaving the current scope.)
2025-09-04[TableGen][Decoder] Decode operands with zero width or all bits known (#156358)Sergei Barannikov1-23/+82
There are two classes of operands that DecoderEmitter cannot currently handle: 1. Operands that do not participate in instruction encoding. 2. Operands whose encoding contains only 1s and 0s. Because of this, targets developed various workarounds. Some targets insert missing operands after an instruction has been (incompletely) decoded, other take into account the missing operands when printing the instruction. Some targets do neither of that and fail to correctly disassemble some instructions. This patch makes it possible to decode both classes of operands and allows to remove existing workarounds. For the case of operand with no contribution to instruction encoding, one should now add `bits<0> OpName` field to instruction encoding record. This will make DecoderEmitter generate a call to the decoder function specified by the operand's DecoderMethod. The function has a signature different from the usual one and looks like this: ``` static DecodeStatus DecodeImm42Operand(MCInst &Inst, const MCDisassembler *Decoder) { Inst.addOperand(MCOperand::createImm(42)); return DecodeStatus::Success; } ``` Notably, encoding bits are not passed to it (since there are none). There is nothing special about the second case, the operand bits are passed as usual. The difference is that before this change, the function was not called if all the bits of the operand were known (no '?' in the operand encoding). There are two options controlling the behavior. Passing an option enables the old behavior. They exist to allow smooth transition to the new behavior. They are temporary (yeah, I know) and will be removed once all targets migrate, possibly giving some more time to downstream targets. Subsequent patches in the stack enable the new behavior on some in-tree targets.
2025-09-04[NFC][MC][DecoderEmitter] Refactor code related to EncodingField (#156759)Rahul Joshi1-19/+39
2025-09-04[LLVM][MC][DecoderEmitter] Fail fatally if `Insn` and decoder table ↵Rahul Joshi1-3/+26
bitwidths mismatch (#156734)
2025-09-02[MC][DecoderEmitter] Fix build warning: explicit specialization cannot have ↵Rahul Joshi1-0/+8
a storage class (#156375) Move `InsnBitWidth` template into anonymous namespace in the generated code and move template specialization of `InsnBitWidth` to anonymous namespace as well, and drop `static` for them. This makes `InsnBitWidth` completely private to each target and fixes the "explicit specialization cannot have a storage class" warning as well as any potential linker errors if `InsnBitWidth` is kept in the `llvm::MCD` namespace.
2025-09-01[LLVM][MC][DecoderEmitter] Add support to specialize decoder per bitwidth ↵Rahul Joshi1-41/+104
(#154865) This change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code. **Current state**: Currently, the code generated by the decoder emitter consists of two key functions: `decodeInstruction` which is the entry point into the generated code and `decodeToMCInst` which is invoked when a decode op is reached while traversing through the decoder table. Both functions are templated on `InsnType` which is the raw instruction bits that are supplied to `decodeInstruction`. Several backends call `decodeInstruction` with different `InsnType` types, leading to several template instantiations of these functions in the final code. As an example, AMDGPU instantiates this function with type `DecoderUInt128` type for decoding 96/128-bit instructions, `uint64_t` for decoding 64-bit instructions, and `uint32_t` for decoding 32-bit instructions. Since there is just one `decodeToMCInst` in the generated code, it has code that handles decoding for *all* instruction sizes. However, the decoders emitted for different instructions sizes rarely have any intersection with each other. That means, in the AMDGPU case, the instantiation with InsnType == DecoderUInt128 has decoder code for 32/64-bit instructions that is *never exercised*. Conversely, the instantiation with InsnType == uint64_t has decoder code for 128/96/32-bit instructions that is never exercised. This leads to unnecessary dead code in the generated disassembler binary (that the compiler cannot eliminate by itself). **New state**: With this change, we introduce an option `specialize-decoders-per-bitwidth`. Under this mode, the DecoderEmitter will generate several versions of `decodeToMCInst` function, one for each bitwidth. The code is still templated, but will require backends to specify, for each `InsnType` used, the bitwidth of the instruction that the type is used to represent using a type-trait `InsnBitWidth`. This will enable the templated code to choose the right variant of `decodeToMCInst`. Under this mode, a particular instantiation will only end up instantiating a single variant of `decodeToMCInst` generated and that will include only those decoders that are applicable to a single bitwidth, resulting in elimination of the code duplication through instantiation and a reduction in code size. Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size. Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. For RISCV, which today uses a single `uint64_t` type, this results in a 3.7% increase in code size (expected as we instantiate the code 3 times now). Actual measured sizes are as follows: ``` Baseline commit: 72c04bb882ad70230bce309c3013d9cc2c99e9a7 Configuration: Ubuntu clang version 18.1.3, release build with asserts disabled. AMDGPU Before After Change ====================================================== .text 612327 275607 55% reduction .rodata 369728 351336 5% reduction RISCV: ====================================================== .text 47407 49187 3.7% increase .rodata 35768 35839 0.1% increase ```
2025-08-31[NFC][MC][DecoderEmitter] Simplify loop to find the best filter (#156237)Rahul Joshi1-19/+6
We can just use `max_element` on the array of filters.
2025-08-31[TableGen][Decoder] Remove special case of single sub-op dag (#156175)Sergei Barannikov1-10/+4
If a custom operand has MIOperandInfo with >= 2 sub-operands, it is required that either the operand or its sub-operands have a decoder method (depending on usage). Require this for single sub-operand operands as well, since there is no good reason not to. There are no changes in the generated files.
2025-08-30[TableGen][Decoder] Simplify parseFixedLenOperands (NFCI) (#156181)Sergei Barannikov1-47/+15
Use information from CGIOperandList instead of re-parsing operand dags from scratch.
2025-08-29[TableGen][Decoder] Cache DecoderNamespace in InstructionEncoding (NFC) ↵Sergei Barannikov1-4/+9
(#156059)
2025-08-29[TableGen][DecoderEmitter] Use StringRef in a few places (NFC) (#156051)Sergei Barannikov1-7/+5
2025-08-28[TableGen][DecoderEmitter] Simplify emitSoftFailTableEntry (NFC) (#155863)Sergei Barannikov1-30/+11
2025-08-26[TableGen][DecoderEmitter] Optimize single-case OPC_ExtractField (#155414)Sergei Barannikov1-18/+17
OPC_ExtractField followed by a single OPC_FilterValue is equivalent to OPC_CheckField. Optimize this relatively common case.
2025-08-26[TableGen][DecoderEmitter] Remove no longer needed MaxFilterWidth (NFC) ↵Sergei Barannikov1-17/+4
(#155382) 11c61581 made the variable redundant. Also remove `Target`, which is apparently unused.
2025-08-26[TableGen][DecoderEmitter] Factor out DecoderTableBuilder (#155220)Sergei Barannikov1-67/+84
Extract the table building methods from FilterChooser into a separate class to relieve it of one of its responsibilities.
2025-08-25[TableGen][DecoderEmitter] Remove dead OPC_Fail (#155229)Sergei Barannikov1-16/+1
It can never be reached. It could be reached if we emitted an opcode that could fall outside the outermost scope, but emission of all such opcodes is guarded by `!isOutermostScope()`. That also means we never add fixups to the outermost scope, so avoid pushing an entry for it onto the stack.
2025-08-25[TableGen][DecoderEmitter] Remove PredicateNamespace (NFC) (#155211)Sergei Barannikov1-20/+12
There is no target named Thumb, so there is no need to make a special case for it. As part of this change, pass CodeGenTarget instead of DecoderEmitter to FilterChooser to remove dependency between the latter two.
2025-08-24[NFCI][MC][DecoderEmitter] Fix BitWidth for fixed length inst encodings ↵Rahul Joshi1-24/+73
(#154934) Change `InstructionEncoding` to use `Size` field to derive the BitWidth for fixed length instructions as opposed to the number of bits in the `Inst` field. For some backends, `Inst` has more bits than `Size`, but `Size` is the true size of the instruction. Also add validation that `Inst` has at least `Size * 8` bits and any bits in `Inst` beyond that are either 0 or unset. Verified no change in generated *GenDisassembler.inc files before/after.
2025-08-24[TableGen][DecoderEmitter] Add a couple of helper methods (NFC) (#155163)Sergei Barannikov1-27/+32
Replace push_back with more specific insertOpcode/insertUInt8.
2025-08-24[TableGen][DecoderEmitter] Refactor emitTableEntries (NFCI) (#155100)Sergei Barannikov1-98/+52
* Inline two small functions so that `emitTableEntries()` calls itself directly rather than through other functions. * Peel the last iteration of the loop as it is special. This should make the code easier to follow.
2025-08-24[TableGen][DecoderEmitter] Print the size of the decoder tables (#155139)Sergei Barannikov1-2/+1
So we can see the changes in table sizes after making changes to DecoderEmitter by simply running `grep DecoderTable`. Also, remove an unnecessary terminating 0 from the end of the tables.
2025-08-24[TableGen][DecoderEmitter] Fix indentation in generated code (NFC)Sergei Barannikov1-10/+10
`MCD::OPC_SoftFail` case in the generated `decodeInstruction()` was overindented, except for the closing brace, which was underindented.
2025-08-23[TableGen][DecoderEmitter] Repurpose Filter class (#155065)Sergei Barannikov1-79/+69
There was a lot of confusion about the responsibilities of Filter and FilterChooser. They created instances of each other and called each other's methods. Some of the methods had similar names and did similar things. This change moves most of the Filter members to FilterChooser and turns Filter into a supplementary class with short lifetime. FilterChooser constructs an array of (candidate) Filters, chooses the best performing one, and applies it to the given set of encodings, creating inferior FilterChoosers as necessary. The Filter array is then destroyed. All responsibility for generating the decoder table now lies with FilterChooser.
2025-08-23[TableGen][DecoderEmitter] Small refactoring (NFC)Sergei Barannikov1-11/+10
Few changes extracted from #155065 to make it smaller.
2025-08-23[TableGen][DecoderEmitter] Fix broken AdditionalEncoding support (#155057)Sergei Barannikov1-0/+2
We didn't have tests for AdditionalEncoding and none of the in-tree targets use this functionality, so I inadvertently broke it in #154288.
2025-08-22[TableGen][DecoderEmitter] Extract a couple of methods (NFC) (#155044)Sergei Barannikov1-51/+55
Extract `findBestFilter() const` searching for the best filter and move calls to `recurse()` out of it to a single place. Extract `dump()` as well, it is useful for debugging.
2025-08-23[TableGen][DecoderEmitter] Remove unused move constructor (NFC)Sergei Barannikov1-10/+0
Also delete no-op destructor declaration.
2025-08-22[TableGen][DecoderEmitter] Move a function to InstructionEncoding (NFC) ↵Sergei Barannikov1-19/+23
(#155038)
2025-08-23[TableGen][DecoderEmitter] Fix decoder reading bytes past instruction (#154916)Sergei Barannikov1-34/+74
See the added test. Before this change the decoder would first read the second byte, despite the fact that there are 1-byte instructions that could match: ``` /* 0 */ MCD::OPC_ExtractField, 8, 8, // Inst{15-8} ... /* 3 */ MCD::OPC_FilterValue, 0, 4, 0, // Skip to: 11 /* 7 */ MCD::OPC_Decode, 186, 2, 0, // Opcode: I16_0, DecodeIdx: 0 /* 11 */ MCD::OPC_FilterValue, 1, 4, 0, // Skip to: 19 /* 15 */ MCD::OPC_Decode, 187, 2, 0, // Opcode: I16_1, DecodeIdx: 0 /* 19 */ MCD::OPC_FilterValue, 2, 4, 0, // Skip to: 27 /* 23 */ MCD::OPC_Decode, 188, 2, 0, // Opcode: I16_2, DecodeIdx: 0 /* 27 */ MCD::OPC_ExtractField, 0, 1, // Inst{0} ... /* 30 */ MCD::OPC_FilterValue, 0, 4, 0, // Skip to: 38 /* 34 */ MCD::OPC_Decode, 189, 2, 1, // Opcode: I8_0, DecodeIdx: 1 /* 38 */ MCD::OPC_FilterValueOrFail, 1, /* 40 */ MCD::OPC_Decode, 190, 2, 1, // Opcode: I8_1, DecodeIdx: 1 /* 44 */ MCD::OPC_Fail, ``` There are no changes in the generated files. The only in-tree target that uses variable length decoder is M68k, which is free of decoding conflicts that could result in the decoder doing OOB access. This also fixes misaligned "Decoding Conflict" dump, prettified example output is shown in the second test.