[llvm-objdump,ARM] Fix big-endian AArch32 disassembly.

The ABI for big-endian AArch32, as specified by AAELF32, is above- averagely complicated. Relocatable object files are expected to store instruction encodings in byte order matching the ELF file's endianness (so, big-endian for a BE ELF file). But executable images can //either// do that //or// store instructions little-endian regardless of data and ELF endianness (to support BE32 and BE8 platforms respectively). They signal the latter by setting the EF_ARM_BE8 flag in the ELF header. (In the case of the Thumb instruction set, this all means that each 16-bit halfword of a Thumb instruction is stored in one or other endianness. The two halfwords of a 32-bit Thumb instruction must appear in the same order no matter what, because the first halfword is the one that must avoid overlapping the encoding of any 16-bit Thumb instruction.) llvm-objdump was unconditionally expecting Arm instructions to be stored little-endian. So it would correctly disassemble a BE8 image, but if you gave it a BE32 image or a BE object file, it would retrieve every instruction in byte-swapped form and disassemble it to nonsense. (Even an object file output by LLVM itself, because ARMMCCodeEmitter outputs instructions big-endian in big-endian mode, which is correct for writing an object file.) This patch allows llvm-objdump to correctly disassemble all three of those classes of Arm ELF file. It does it by introducing a new SubtargetFeature for big-endian instructions, setting it from the ELF image type and flags during llvm-objdump setup, and teaching both ARMDisassembler and llvm-objdump itself to pay attention to it when retrieving instruction data from a section being disassembled. Differential Revision: https://reviews.llvm.org/D130902
author: Simon Tatham <simon.tatham@arm.com> 2022-08-01 13:40:32 +0100
committer: Simon Tatham <simon.tatham@arm.com> 2022-08-08 10:49:51 +0100
commit: 72017e9b16b737c5bd7c1dd33abff36f368fa724 (patch)
tree: 493cbdf0631efa94878df0f5310c8baaa7c8f9fa /llvm/tools/llvm-objdump/llvm-objdump.cpp
parent: 1eee6de873974f55538df976bf7802f019eac70a (diff)
download: llvm-72017e9b16b737c5bd7c1dd33abff36f368fa724.zip
llvm-72017e9b16b737c5bd7c1dd33abff36f368fa724.tar.gz
llvm-72017e9b16b737c5bd7c1dd33abff36f368fa724.tar.bz2
1 files changed, 32 insertions, 2 deletions
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index fd83dc1..efac3a8 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -690,14 +690,14 @@ public:
           OS << ' '
              << format_hex_no_prefix(
                     llvm::support::endian::read<uint16_t>(
-                        Bytes.data() + Pos, llvm::support::little),
+                        Bytes.data() + Pos, InstructionEndianness),
                     4);
       } else {
         for (; Pos + 4 <= End; Pos += 4)
           OS << ' '
              << format_hex_no_prefix(
                     llvm::support::endian::read<uint32_t>(
-                        Bytes.data() + Pos, llvm::support::little),
+                        Bytes.data() + Pos, InstructionEndianness),
                     8);
       }
       if (Pos < End) {
@@ -713,6 +713,13 @@ public:
     } else
       OS << "\t<unknown>";
   }
+
+  void setInstructionEndianness(llvm::support::endianness Endianness) {
+    InstructionEndianness = Endianness;
+  }
+
+private:
+  llvm::support::endianness InstructionEndianness = llvm::support::little;
 };
 ARMPrettyPrinter ARMPrettyPrinterInst;
 
@@ -1852,6 +1859,29 @@ static void disassembleObject(ObjectFile *Obj, bool InlineRelocs) {
   if (MCPU.empty())
     MCPU = Obj->tryGetCPUName().value_or("").str();
 
+  if (isArmElf(*Obj)) {
+    // When disassembling big-endian Arm ELF, the instruction endianness is
+    // determined in a complex way. In relocatable objects, AAELF32 mandates
+    // that instruction endianness matches the ELF file endianness; in
+    // executable images, that's true unless the file header has the EF_ARM_BE8
+    // flag, in which case instructions are little-endian regardless of data
+    // endianness.
+    //
+    // We must set the big-endian-instructions SubtargetFeature to make the
+    // disassembler read the instructions the right way round, and also tell
+    // our own prettyprinter to retrieve the encodings the same way to print in
+    // hex.
+    const auto *Elf32BE = dyn_cast<ELF32BEObjectFile>(Obj);
+
+    if (Elf32BE && (Elf32BE->isRelocatableObject() ||
+                    !(Elf32BE->getPlatformFlags() & ELF::EF_ARM_BE8))) {
+      Features.AddFeature("+big-endian-instructions");
+      ARMPrettyPrinterInst.setInstructionEndianness(llvm::support::big);
+    } else {
+      ARMPrettyPrinterInst.setInstructionEndianness(llvm::support::little);
+    }
+  }
+
   std::unique_ptr<const MCSubtargetInfo> STI(
       TheTarget->createMCSubtargetInfo(TripleName, MCPU, Features.getString()));
   if (!STI)
author	Simon Tatham <simon.tatham@arm.com>	2022-08-01 13:40:32 +0100
committer	Simon Tatham <simon.tatham@arm.com>	2022-08-08 10:49:51 +0100
commit	72017e9b16b737c5bd7c1dd33abff36f368fa724 (patch)
tree	493cbdf0631efa94878df0f5310c8baaa7c8f9fa /llvm/tools/llvm-objdump/llvm-objdump.cpp
parent	1eee6de873974f55538df976bf7802f019eac70a (diff)
download	llvm-72017e9b16b737c5bd7c1dd33abff36f368fa724.zip llvm-72017e9b16b737c5bd7c1dd33abff36f368fa724.tar.gz llvm-72017e9b16b737c5bd7c1dd33abff36f368fa724.tar.bz2