diff options
author | Rahul Joshi <rjoshi@nvidia.com> | 2025-09-01 13:44:18 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-09-01 13:44:18 -0700 |
commit | dafffe262d6d1114fa83ec155241aad4e7793845 (patch) | |
tree | b375f4458f33ad8db8a1c13338c4a4b9fa146f26 /llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp | |
parent | 33d5a3b455d3bb0d0487dabb98728aeaa8cba03b (diff) | |
download | llvm-dafffe262d6d1114fa83ec155241aad4e7793845.zip llvm-dafffe262d6d1114fa83ec155241aad4e7793845.tar.gz llvm-dafffe262d6d1114fa83ec155241aad4e7793845.tar.bz2 |
[LLVM][MC][DecoderEmitter] Add support to specialize decoder per bitwidth (#154865)
This change adds an option to specialize decoders per bitwidth, which
can help reduce the (compiled) code size of the decoder code.
**Current state**:
Currently, the code generated by the decoder emitter consists of two key
functions: `decodeInstruction` which is the entry point into the
generated code and `decodeToMCInst` which is invoked when a decode op is
reached while traversing through the decoder table. Both functions are
templated on `InsnType` which is the raw instruction bits that are
supplied to `decodeInstruction`.
Several backends call `decodeInstruction` with different `InsnType`
types, leading to several template instantiations of these functions in
the final code. As an example, AMDGPU instantiates this function with
type `DecoderUInt128` type for decoding 96/128-bit instructions,
`uint64_t` for decoding 64-bit instructions, and `uint32_t` for decoding
32-bit instructions. Since there is just one `decodeToMCInst` in the
generated code, it has code that handles decoding for *all* instruction
sizes. However, the decoders emitted for different instructions sizes
rarely have any intersection with each other. That means, in the AMDGPU
case, the instantiation with InsnType == DecoderUInt128 has decoder code
for 32/64-bit instructions that is *never exercised*. Conversely, the
instantiation with InsnType == uint64_t has decoder code for
128/96/32-bit instructions that is never exercised. This leads to
unnecessary dead code in the generated disassembler binary (that the
compiler cannot eliminate by itself).
**New state**:
With this change, we introduce an option
`specialize-decoders-per-bitwidth`. Under this mode, the DecoderEmitter
will generate several versions of `decodeToMCInst` function, one for
each bitwidth. The code is still templated, but will require backends to
specify, for each `InsnType` used, the bitwidth of the instruction that
the type is used to represent using a type-trait `InsnBitWidth`. This
will enable the templated code to choose the right variant of
`decodeToMCInst`. Under this mode, a particular instantiation will only
end up instantiating a single variant of `decodeToMCInst` generated and
that will include only those decoders that are applicable to a single
bitwidth, resulting in elimination of the code duplication through
instantiation and a reduction in code size.
Additionally, under this mode, decoders are uniqued only within a given
bitwidth (as opposed to across all bitwidths without this option), so
the decoder index values assigned are smaller, and consume less bytes in
their ULEB128 encoding. As a result, the generated decoder tables can
also reduce in size.
Adopt this feature for the AMDGPU and RISCV backend. In a release build,
this results in a net 55% reduction in the .text size of
libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. For
RISCV, which today uses a single `uint64_t` type, this results in a 3.7%
increase in code size (expected as we instantiate the code 3 times now).
Actual measured sizes are as follows:
```
Baseline commit: 72c04bb882ad70230bce309c3013d9cc2c99e9a7
Configuration: Ubuntu clang version 18.1.3, release build with asserts disabled.
AMDGPU Before After Change
======================================================
.text 612327 275607 55% reduction
.rodata 369728 351336 5% reduction
RISCV:
======================================================
.text 47407 49187 3.7% increase
.rodata 35768 35839 0.1% increase
```
Diffstat (limited to 'llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp')
-rw-r--r-- | llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp | 16 |
1 files changed, 9 insertions, 7 deletions
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp index de1bdb4..c8b89f5 100644 --- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp +++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp @@ -558,7 +558,7 @@ static DecodeStatus decodeXqccmpRlistS0(MCInst &Inst, uint32_t Imm, return decodeZcmpRlist(Inst, Imm, Address, Decoder); } -static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint32_t Insn, +static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint16_t Insn, uint64_t Address, const MCDisassembler *Decoder) { uint32_t Rs1 = fieldFromInstruction(Insn, 7, 5); @@ -701,6 +701,12 @@ static constexpr DecoderListEntry DecoderList32[]{ {DecoderTableZdinxRV32Only32, {}, "RV32-only Zdinx (Double in Integer)"}, }; +// Define bitwidths for various types used to instantiate the decoder. +template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint16_t> = 16; +template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32; +// Use uint64_t to represent 48 bit instructions. +template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 48; + DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size, ArrayRef<uint8_t> Bytes, uint64_t Address, @@ -711,9 +717,7 @@ DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size, } Size = 4; - // Use uint64_t to match getInstruction48. decodeInstruction is templated - // on the Insn type. - uint64_t Insn = support::endian::read32le(Bytes.data()); + uint32_t Insn = support::endian::read32le(Bytes.data()); for (const DecoderListEntry &Entry : DecoderList32) { if (!Entry.haveContainedFeatures(STI.getFeatureBits())) @@ -759,9 +763,7 @@ DecodeStatus RISCVDisassembler::getInstruction16(MCInst &MI, uint64_t &Size, } Size = 2; - // Use uint64_t to match getInstruction48. decodeInstruction is templated - // on the Insn type. - uint64_t Insn = support::endian::read16le(Bytes.data()); + uint16_t Insn = support::endian::read16le(Bytes.data()); for (const DecoderListEntry &Entry : DecoderList16) { if (!Entry.haveContainedFeatures(STI.getFeatureBits())) |