diff options
author | Simon Tatham <simon.tatham@arm.com> | 2022-07-26 09:20:41 +0100 |
---|---|---|
committer | Simon Tatham <simon.tatham@arm.com> | 2022-07-26 09:35:30 +0100 |
commit | 55f1fbf005fef1e4024b2b44db0842f23fc5ea64 (patch) | |
tree | f78e6e138b37492871acacf84d01df0481a7261d /llvm/tools/llvm-objdump/llvm-objdump.cpp | |
parent | c4b6e5f9500fc3c44d587896eb402c8ede21eb10 (diff) | |
download | llvm-55f1fbf005fef1e4024b2b44db0842f23fc5ea64.zip llvm-55f1fbf005fef1e4024b2b44db0842f23fc5ea64.tar.gz llvm-55f1fbf005fef1e4024b2b44db0842f23fc5ea64.tar.bz2 |
[MC,llvm-objdump,ARM] Target-dependent disassembly resync policy.
Currently, when llvm-objdump is disassembling a code section and
encounters a point where no instruction can be decoded, it uses the
same policy on all targets: consume one byte of the section, emit it
as "<unknown>", and try disassembling from the next byte position.
On an architecture where instructions are always 4 bytes long and
4-byte aligned, this makes no sense at all. If a 4-byte word cannot be
decoded as an instruction, then the next place that a valid
instruction could //possibly// be found is 4 bytes further on.
Disassembling from a misaligned address can't possibly produce
anything that the code generator intended, or that the CPU would even
attempt to execute.
This patch introduces a new MCDisassembler virtual method called
`suggestBytesToSkip`, which allows each target to choose its own
resynchronization policy. For Arm (as opposed to Thumb) and AArch64,
I've filled in the new method to return a fixed width of 4.
Thumb is a more interesting case, because the criterion for
identifying 2-byte and 4-byte instruction encodings is very simple,
and doesn't require the particular instruction to be recognized. So
`suggestBytesToSkip` is also passed an ArrayRef of the bytes in
question, so that it can take that into account. The new test case
shows Thumb disassembly skipping over two unrecognized instructions,
and identifying one as 2-byte and one as 4-byte.
For targets other than Arm and AArch64, this is NFC: the base class
implementation of `suggestBytesToSkip` still returns 1, so that the
existing behavior is unchanged. Other targets can fill in their own
implementations as they see fit; I haven't attempted to choose a new
behavior for each one myself.
I've updated all the call sites of `MCDisassembler::getInstruction` in
llvm-objdump, and also one in sancov, which was the only other place I
spotted the same idiom of `if (Size == 0) Size = 1` after a call to
`getInstruction`.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D130357
Diffstat (limited to 'llvm/tools/llvm-objdump/llvm-objdump.cpp')
-rw-r--r-- | llvm/tools/llvm-objdump/llvm-objdump.cpp | 25 |
1 files changed, 15 insertions, 10 deletions
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp index c486088..5feebc5 100644 --- a/llvm/tools/llvm-objdump/llvm-objdump.cpp +++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp @@ -1022,10 +1022,12 @@ static void collectLocalBranchTargets( // Disassemble a real instruction and record function-local branch labels. MCInst Inst; uint64_t Size; - bool Disassembled = DisAsm->getInstruction( - Inst, Size, Bytes.slice(Index - SectionAddr), Index, nulls()); + ArrayRef<uint8_t> ThisBytes = Bytes.slice(Index - SectionAddr); + bool Disassembled = + DisAsm->getInstruction(Inst, Size, ThisBytes, Index, nulls()); if (Size == 0) - Size = 1; + Size = std::min(ThisBytes.size(), + DisAsm->suggestBytesToSkip(ThisBytes, Index)); if (Disassembled && MIA) { uint64_t Target; @@ -1068,10 +1070,11 @@ static void addSymbolizer( for (size_t Index = 0; Index != Bytes.size();) { MCInst Inst; uint64_t Size; - DisAsm->getInstruction(Inst, Size, Bytes.slice(Index), SectionAddr + Index, - nulls()); + ArrayRef<uint8_t> ThisBytes = Bytes.slice(Index - SectionAddr); + DisAsm->getInstruction(Inst, Size, ThisBytes, Index, nulls()); if (Size == 0) - Size = 1; + Size = std::min(ThisBytes.size(), + DisAsm->suggestBytesToSkip(ThisBytes, Index)); Index += Size; } ArrayRef<uint64_t> LabelAddrsRef = SymbolizerPtr->getReferencedAddresses(); @@ -1538,11 +1541,13 @@ static void disassembleObject(const Target *TheTarget, ObjectFile &Obj, // Disassemble a real instruction or a data when disassemble all is // provided MCInst Inst; - bool Disassembled = - DisAsm->getInstruction(Inst, Size, Bytes.slice(Index), - SectionAddr + Index, CommentStream); + ArrayRef<uint8_t> ThisBytes = Bytes.slice(Index); + uint64_t ThisAddr = SectionAddr + Index; + bool Disassembled = DisAsm->getInstruction(Inst, Size, ThisBytes, + ThisAddr, CommentStream); if (Size == 0) - Size = 1; + Size = std::min(ThisBytes.size(), + DisAsm->suggestBytesToSkip(ThisBytes, ThisAddr)); LVP.update({Index, Section.getIndex()}, {Index + Size, Section.getIndex()}, Index + Size != End); |