Age | Commit message (Collapse) | Author | Files | Lines |
|
values
- Currently, Intrinsic can only have up to 9 return values. In case new
intrinsics require more than 9 return values, additional ITT_STRUCTxxx
values need to be added to support > 9 return values. Instead, this
patch unifies them into a single IIT_STRUCT followed by a BYTE
specifying the minimal 2 (encoded as 0) and maximal 257 (encoded as
255) return values.
|
|
(#158426)
cPTR is a wildcard CHERI capability value type, used analogously to iPTR. This allows TableGen patterns to abstract over CHERI capability widths.
Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
|
|
(#159778)
Extract error reporting code emitted by CodeEmitterGen into
MCCodeEmitter static members functions.
Additionally, remove unused ErrorHandling.h header from several files.
|
|
Previously, `bits<0>` only had effect if `ignore-non-decodable-operands`
wasn't specified. Handle it even if the option was specified. This
should allow for a smoother transition to the option removed.
The change revealed a couple of inaccuracies in RISCV compressed
instruction definitions.
* `C_ADDI4SPN` has `bits<5> rs1` field, but `rs1` is not encoded. It
should be `bits<0>`.
* `C_ADDI16SP` has `bits<5> rd` in the base class, but it is unused
since `Inst{11-7}` is overwritten with constant bits.
We should instead set `rd = 2` and `Inst{11-7} = rd`. There are a couple
of alternative fixes, but this one is the shortest.
|
|
|
|
### Current state
We have FilterChooser class, which can be thought of as a **tree of
encodings**. Tree nodes are instances of FilterChooser itself, and come
in two types:
* A node containing single encoding that has *constant* bits in the
specified bit range, a.k.a. singleton node.
* A node containing only child nodes, where each child represents a set
of encodings that have the same *constant* bits in the specified bit
range.
Either of these nodes can have an additional child, which represents a
set of encodings that have some *unknown* bits in the same bit range.
As can be seen, the **data structure is very high level**.
The encoding tree represented by FilterChooser is then converted into a
finite-state machine (FSM), represented as **byte array**. The
translation is straightforward: for each node of the tree we emit a
sequence of opcodes that check encoding bits and predicates for each
encoding. For a singleton node we also emit a terminal "decode" opcode.
The translation is done in one go, and this has negative consequences:
* We miss optimization opportunities.
* We have to use "fixups" when encoding transitions in the FSM since we
don't know the size of the data we want to jump over in advance. We have
to emit the data first and then fix up the location of the jump. This
means the fixup size has to be large enough to encode the longest jump,
so **most of the transitions are encoded inefficiently**.
* Finally, when converting the FSM into human readable form, we have to
**decode the byte array we've just emitted**. This is also done in one
go, so we **can't do any pretty printing**.
### This PR
We introduce an intermediary data structure, decoder tree, that can be
thought as **AST of the decoder program**.
This data structure is **low level** and as such allows for optimization
and analysis.
It resolves all the issues listed above. We now can:
* Emit more optimal opcode sequences.
* Compute the size of the data to be emitted in advance, avoiding
fixups.
* Do pretty printing.
Serialization is done by a new class, DecoderTableEmitter, which
converts the AST into a FSM in **textual form**, streamed right into the
output file.
### Results
* The new approach immediately resulted in 12% total table size savings
across all in-tree targets, without implementing any optimizations on
the AST. Many tables observe ~20% size reduction.
* The generated file is much more readable.
* The implementation is arguably simpler and more straightforward (the
diff is only +150~200 lines, which feels rather small for the benefits
the change gives).
|
|
|
|
Replace the target uses of PointerLikeRegClass with RegClassByHwMode
|
|
This is a generalization of the LookupPtrRegClass mechanism.
AMDGPU has several use cases for swapping the register class of
instruction operands based on the subtarget, but none of them
really fit into the box of being pointer-like.
The current system requires manual management of an arbitrary integer
ID. For the AMDGPU use case, this would end up being around 40 new
entries to manage.
This just introduces the base infrastructure. I have ports of all
the target specific usage of PointerLikeRegClass ready.
|
|
|
|
|
|
This is a minor fix from comment
https://github.com/llvm/llvm-project/pull/157965/files#r2347317186
introduced in #157965.
|
|
|
|
(#159329)
The pattern optimizations in GlobalISelMatchTable.cpp can extract common
predicates out of pattern alternatives by putting the pattern alternatives into
a GroupMatcher and moving common predicates into the GroupMatcher's predicate
list. This patch adds checks to avoid hoisting a common predicate before
matchers that record named operands that the predicate uses, which would lead
to segfaults when the imported patterns are matched.
See the added test for a concrete example inspired by the AMDGPU backend.
This fixes a bug encountered in #143881.
|
|
RISC-V has over a million bytes in the table.
|
|
Also turn the method into a static function so it can be used without
an instance of the class.
|
|
`tmp` is always of integer type, so we can use bitwise OR and shift.
|
|
(#158182)
Tablegen would generate code to access TargetResourceIndices with
processor ID.
The TargetProcResourceIndexStart[] array is generated for each processor
which has itineraries. The processor which doesn't has itineraries is excluded
from the array. When a target has mixed processors, the processor ID may
exceed the array size and cause the error.
This patch is to generate a table mapping processor with itineraries to
resource index, so that scheduler can get the correct resource index with
processor ID.
|
|
OPC_Decode is a specialized OPC_TryDecode. The difference between them
is that OPC_TryDecode performs a "completeness check", while OPC_Decode
asserts that the check passes.
The check is just a boolean test, which is nothing compared to the
complexity of the decoding process, so there is no point in having a
special opcode that optimizes the check.
|
|
Extracted from #155889, which removes inclusion of `MCDecoderOps.h`.
|
|
SmallSetVector is too optimistic, there are usually more than 16 unique
decoders and predicates. Modernize `typedef` to `using` while here.
|
|
|
|
string (NFC) (#159089)
These functions will see more uses in a future patch.
This also resolves a FIXME.
|
|
* Use streams to avoid dealing with std::string
* Print operand masks in hex
* Make the output more succinct
|
|
(#158789)
DecoderTableBuilder will be removed. Move out the class the methods that
will remain.
|
|
|
|
|
|
Eliminate `doesOpcodeNeedPredicate` and instead have
`emitPredicateMatch` return true if any predicates were generated.
Delegate actual predicate generation in `emitPredicateMatch` to
`SubtargetFeatureInfo::emitMCPredicateCheck`. Additionally, remove the
redundant parenthesis around the predicate conditions in the generated
`checkDecoderPredicate` function.
Note that for ARM/AMDGPU this reduces the total # of predicates
generated by a few. It seems the old code would sometimes generate
duplicate predicates which were identical in semantics but one had an
extra pair of parentheses (i..e, `X` and `(X)`). `emitMCPredicateCheck`
does not seems to have that issue.
|
|
Several code cleanup changes in code to emit decoder tables:
- Start comments on each line at a fixed column for readibility.
- Combine repeated code to decode and emit ULEB128 into a single
function.
- Add helper `getDecoderOpName` to print decoder op.
- Print Filter/CheckField/predicate index values with those opcodes.
|
|
(#158505)
So that it can be used in CodeEmitterGen / VarLenCodeEmitterGen.
|
|
To avoid passing them to member functions.
|
|
Make IntrinsicsToAttributesMap's func. and arg. fields be able to have
adaptive sizes based on input other than hardcoded 8bits/8bits.
This will ease the pressure for adding new intrinsics in private
downstreams.
func. attr bitsize will become 7(127/128) vs 8(255/256)
|
|
Follow-up to #156358. The original change didn't take into account
operands with "all zeros" encoding, now fixed.
|
|
This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend.
These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types.
Co-authored-by: David Chisnall <theraven@theravensnest.org>
Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
|
|
Do not exit when the first decoding conflict is encountered. Instead
record the conflict and continue to report any additional decoding
conflicts and exit fatally after all instructions have been processed.
|
|
Change various `InstBits` tables have an entry only for non-pseudo
target instructions and adjust the indexing into these tables
accordingly.
Some minor refactoring related to this:
- Use early return after handling variable length encodings
- Reduce the scope of anonymous namespace to just the class declaration.
Example reductions in these table sizes for some targets:
```
Target FirstSupportedOpcode Reduction in size
AMDGPU 10813 10813 * 16 = 168KB
RISCV 12051 12051 * 8 = 94KB
```
|
|
`Predicates` and `Features` fields serve the same purpose. They should
be kept in sync, but not all predicates are based on features. This
resulted in introducing dummy features for that only reason.
This patch removes `Features` field and changes TableGen emitters to use
`Predicates` instead.
Historically, predicates were written with the assumption that the
checking code will be used in `SelectionDAGISel` subclasses, meaning
they will have access to the subclass variables, such as `Subtarget`.
There are no such variables in the generated
`GenSubtargetInfo::getHwModeSet()`, so we need to provide them. This can
be achieved by subclassing `HwModePredicateProlog`, see an example in
`Hexagon.td`.
|
|
This will make it possible for tablegen to make subtarget
dependent decisions without adding new arguments to every
target.
---------
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
|
|
|
|
This change introduces OPC_Scope opcode, whose only purpose is to record
a continuation point to resume at if a subsequent opcode fails.
Each OPC_Scope pushes an entry onto the scope stack; an entry is popped
if an opcode in the scope fails.
Previously, we recorded this information on several opcodes, it has been
removed. A series of such opcodes often referred to the same
continuation point; this information is now recorded in one place,
reducing table sizes in most cases. Average reduction is 1.1%, some
table observe up to 7% reduction in size.
The new behavior of those opcodes is "check or leave scope". If we're in
the outermost scope (scope stack is empty), they act as "check or fail".
There is one opcode, OPC_FilterValueOrSkip that behaves like the old
OPC_FilterValue. It is special because it acts as a case of a switch
statement and has nothing to do with scopes. (If a case fails, we should
try the next case instead of leaving the current scope.)
|
|
The added tests used to crash when attempting to dereference a nullptr
MIOpInfo or call MIOpInfo->getArg(0) on an empty MIOpInfo dag.
|
|
|
|
|
|
(#156973)
- Replace manual code to convert a `BitsInit` to a uint64_t by using
`convertInitializerToInt` where applicable.
- Add `BitsInit::convertKnownBitsToInt` to handle existing patterns in
DFAEmitter.cpp and RegisterInfoEmitter.cpp.
- Consolidate 3 copies of the same function in X86 emitters into a
single function.
|
|
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
Print the source location of the instruction definition in comment next
to the enum value for each instruction. To make this more readable,
change formatting of the instruction enums to be better aligned.
Example output:
```
VLD4qWB_register_Asm_8 = 573, // (ARMInstrNEON.td:8849)
VMOVD0 = 574, // (ARMInstrNEON.td:6337)
VMOVDcc = 575, // (ARMInstrVFP.td:2466)
VMOVHcc = 576, // (ARMInstrVFP.td:2474)
VMOVQ0 = 577, // (ARMInstrNEON.td:6341)
```
|
|
GenInstrInfo.inc. (#156960)
The name is most interesting and if you really need the number you can
use the name to find the entry in the enum or use the first field of the
table row.
|
|
There are two classes of operands that DecoderEmitter cannot currently
handle:
1. Operands that do not participate in instruction encoding.
2. Operands whose encoding contains only 1s and 0s.
Because of this, targets developed various workarounds. Some targets
insert missing operands after an instruction has been (incompletely)
decoded, other take into account the missing operands when printing the
instruction. Some targets do neither of that and fail to correctly
disassemble some instructions.
This patch makes it possible to decode both classes of operands and
allows to remove existing workarounds.
For the case of operand with no contribution to instruction encoding,
one should now add `bits<0> OpName` field to instruction encoding
record. This will make DecoderEmitter generate a call to the decoder
function specified by the operand's DecoderMethod. The function has a
signature different from the usual one and looks like this:
```
static DecodeStatus DecodeImm42Operand(MCInst &Inst, const MCDisassembler *Decoder) {
Inst.addOperand(MCOperand::createImm(42));
return DecodeStatus::Success;
}
```
Notably, encoding bits are not passed to it (since there are none).
There is nothing special about the second case, the operand bits are
passed as usual. The difference is that before this change, the function
was not called if all the bits of the operand were known (no '?' in the
operand encoding).
There are two options controlling the behavior. Passing an option
enables the old behavior. They exist to allow smooth transition to the
new behavior. They are temporary (yeah, I know) and will be removed once
all targets migrate, possibly giving some more time to downstream
targets.
Subsequent patches in the stack enable the new behavior on some in-tree
targets.
|
|
|
|
bitwidths mismatch (#156734)
|